Clean up after recent fixes, move address calculations
around and change the variable init, so that we can have
just one start_offset == end_offset check.
Make the check a little stricter to preserve the -EINVAL
error if requested start offset is larger than the region
itself.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a few fentry/fexit programs returning non-0.
The tests with these programs will break with the previous
patch which enfoced return-0 rules. Fix them properly.
Fixes: ac065870d9 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200514053207.1298479-1-yhs@fb.com
Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.
Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.
Fixes: a7df4870d7 ("net_sched: fix tcm_parent in tc filter dump")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.
Fixes: fc9702273e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com
Commit 6879c042e1 ("tools/bpf: selftests: Add bpf_iter selftests")
added self tests for bpf_iter feature. But two subtests
ipv6_route and netlink needs llvm latest 10.x release branch
or trunk due to a bug in llvm BPF backend. This patch added
the file README.rst to document these two failures
so people using llvm 10.0.0 can be aware of them.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com
It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.
This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):
base : 9.200 ± 0.319M/s
tp : 6.690 ± 0.125M/s
rawtp : 8.571 ± 0.214M/s
kprobe : 6.431 ± 0.048M/s
fentry : 8.955 ± 0.241M/s
fmodret : 8.903 ± 0.135M/s
So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.
Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement
user-space benchmarking part into benchmark runner to compare results. Results
with ./bench are consistently somewhat lower than test_overhead's, but relative
performance of various types of BPF programs stay consisten (e.g., kretprobe is
noticeably slower). This slowdown seems to be coming from the fact that
test_overhead is single-threaded, while benchmark always spins off at least
one thread for producer. This has been confirmed by hacking multi-threaded
test_overhead variant and also single-threaded bench variant. Resutls are
below. run_bench_rename.sh script from benchs/ subdirectory was used to
produce results for ./bench.
Single-threaded implementations
===============================
/* bench: single-threaded, atomics */
base : 4.622 ± 0.049M/s
kprobe : 3.673 ± 0.052M/s
kretprobe : 2.625 ± 0.052M/s
rawtp : 4.369 ± 0.089M/s
fentry : 4.201 ± 0.558M/s
fexit : 4.309 ± 0.148M/s
fmodret : 4.314 ± 0.203M/s
/* selftest: single-threaded, no atomics */
task_rename base 4555K events per sec
task_rename kprobe 3643K events per sec
task_rename kretprobe 2506K events per sec
task_rename raw_tp 4303K events per sec
task_rename fentry 4307K events per sec
task_rename fexit 4010K events per sec
task_rename fmod_ret 3984K events per sec
Multi-threaded implementations
==============================
/* bench: multi-threaded w/ atomics */
base : 3.910 ± 0.023M/s
kprobe : 3.048 ± 0.037M/s
kretprobe : 2.300 ± 0.015M/s
rawtp : 3.687 ± 0.034M/s
fentry : 3.740 ± 0.087M/s
fexit : 3.510 ± 0.009M/s
fmodret : 3.485 ± 0.050M/s
/* selftest: multi-threaded w/ atomics */
task_rename base 3872K events per sec
task_rename kprobe 3068K events per sec
task_rename kretprobe 2350K events per sec
task_rename raw_tp 3731K events per sec
task_rename fentry 3639K events per sec
task_rename fexit 3558K events per sec
task_rename fmod_ret 3511K events per sec
/* selftest: multi-threaded, no atomics */
task_rename base 3945K events per sec
task_rename kprobe 3298K events per sec
task_rename kretprobe 2451K events per sec
task_rename raw_tp 3718K events per sec
task_rename fentry 3782K events per sec
task_rename fexit 3543K events per sec
task_rename fmod_ret 3526K events per sec
Note that the fact that ./bench benchmark always uses atomic increments for
counting, while test_overhead doesn't, doesn't influence test results all that
much.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com
While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).
This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.
Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.
Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.
To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.
Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.
Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.
$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s
Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s
Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s
Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s
Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s
Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s
Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s
Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s
Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s
Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s
Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s
Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s
Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s
Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com
Add testing_helpers.c, which will contain generic helpers for test runners and
tests needing some common generic functionality, like parsing a set of
numbers.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-2-andriin@fb.com
This is basically a test-suite for setns() and as of now contains:
- test that we can't pass garbage flags
- test that we can't attach to the namespaces of task that has already exited
- test that we can incrementally setns into all namespaces of a target task
using a pidfd
- test that we can setns atomically into all namespaces of a target task
- test that we can't cross setns into a user namespace outside of our user
namespace hierarchy
- test that we can't setns into namespaces owned by user namespaces over which
we are not privileged
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200505140432.181565-4-christian.brauner@ubuntu.com
Before commit 74b5a5968f ("selftests/bpf: Replace test_progs and
test_maps w/ general rule") selftests/bpf used generic install
target from selftests/lib.mk to install generated bpf test progs
by mentioning them in TEST_GEN_FILES variable.
Take that functionality back.
Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513021722.7787-1-yauheni.kaliuta@redhat.com
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200507185057.GA13981@embeddedor
The added test includes the following subtests:
- test verifier change for btf_id_or_null
- test load/create_iter/read for
ipv6_route/netlink/bpf_map/task/task_file
- test anon bpf iterator
- test anon bpf iterator reading one char at a time
- test file bpf iterator
- test overflow (single bpf program output not overflow)
- test overflow (single bpf program output overflows)
- test bpf prog returning 1
The ipv6_route tests the following verifier change
- access fields in the variable length array of the structure.
The netlink load tests the following verifier change
- put a btf_id ptr value in a stack and accessible to
tracing/iter programs.
The anon bpf iterator also tests link auto attach through skeleton.
$ test_progs -n 2
#2/1 btf_id_or_null:OK
#2/2 ipv6_route:OK
#2/3 netlink:OK
#2/4 bpf_map:OK
#2/5 task:OK
#2/6 task_file:OK
#2/7 anon:OK
#2/8 anon-read-one-char:OK
#2/9 file:OK
#2/10 overflow:OK
#2/11 overflow-e2big:OK
#2/12 prog-ret-1:OK
#2 bpf_iter:OK
Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com
The implementation is arbitrary, just to show how the bpf programs
can be written for bpf_map/task/task_file. They can be costomized
for specific needs.
For example, for bpf_map, the iterator prints out:
$ cat /sys/fs/bpf/my_bpf_map
id refcnt usercnt locked_vm
3 2 0 20
6 2 0 20
9 2 0 20
12 2 0 20
13 2 0 20
16 2 0 20
19 2 0 20
%%% END %%%
For task, the iterator prints out:
$ cat /sys/fs/bpf/my_task
tgid gid
1 1
2 2
....
1944 1944
1948 1948
1949 1949
1953 1953
=== END ===
For task/file, the iterator prints out:
$ cat /sys/fs/bpf/my_task_file
tgid gid fd file
1 1 0 ffffffff95c97600
1 1 1 ffffffff95c97600
1 1 2 ffffffff95c97600
....
1895 1895 255 ffffffff95c8fe00
1932 1932 0 ffffffff95c8fe00
1932 1932 1 ffffffff95c8fe00
1932 1932 2 ffffffff95c8fe00
1932 1932 3 ffffffff95c185c0
This is able to print out all open files (fd and file->f_op), so user can compare
f_op against a particular kernel file operations to find what it is.
For example, from /proc/kallsyms, we can find
ffffffff95c185c0 r eventfd_fops
so we will know tgid 1932 fd 3 is an eventfd file descriptor.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com
Make sure that the drive restricts incorrect order of inserted matchall
vs. flower rules.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Check that matchall rules with sample actions are not possible to be
inserted to egress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The file is about to contain matchall restrictions too, so change the
name to make it more generic.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to have a tighter control on what ports we bind to in
the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
connect() becomes slightly more expensive. The expensive part
comes from the fact that we now need to call inet_csk_get_port()
that verifies that the port is not used and allocates an entry
in the hash table for it.
Since we can't rely on "snum || !bind_address_no_port" to prevent
us from calling POST_BIND hook anymore, let's add another bind flag
to indicate that the call site is BPF program.
v5:
* fix wrong AF_INET (should be AF_INET6) in the bpf program for v6
v3:
* More bpf_bind documentation refinements (Martin KaFai Lau)
* Add UDP tests as well (Martin KaFai Lau)
* Don't start the thread, just do socket+bind+listen (Martin KaFai Lau)
v2:
* Update documentation (Andrey Ignatov)
* Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
1. Move pkt_v4 and pkt_v6 into network_helpers and adjust the users.
2. Copy-paste spin_lock_thread into two tests that use it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-3-sdf@google.com
Move the following routines that let us start a background listener
thread and connect to a server by fd to the test_prog:
* start_server - socket+bind+listen
* connect_to_fd - connect to the server identified by fd
These will be used in the next commit.
Also, extend these helpers to support AF_INET6 and accept the family
as an argument.
v5:
* drop pthread.h (Martin KaFai Lau)
* add SO_SNDTIMEO (Martin KaFai Lau)
v4:
* export extra helper to start server without a thread (Martin KaFai Lau)
* tcp_rtt is no longer starting background thread (Martin KaFai Lau)
v2:
* put helpers into network_helpers.c (Andrii Nakryiko)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-2-sdf@google.com
shellcheck complains that egrep is deprecated, and the grep man page
agrees. Use grep -E instead.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
It is Very Rude to clear dmesg in test scripts. That's because the
script may be part of a larger test run, and clearing dmesg
potentially destroys the output of other tests.
We can avoid using dmesg -c by saving the content of dmesg before the
test, and then using diff to compare that to the dmesg afterward,
producing a log with just the added lines.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The UNRESOLVED state is much more apporiate than the UNSUPPORTED state
for the absence of the test module, as it matches "test was set up
incorrectly" situation in the README file.
A possible scenario is that the function was enabled (supported by the
kernel) but the module was not installed properly, in this case we
cannot call this as UNSUPPORTED.
This change also make it consistent with other module-related tests
in ftrace.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Merge misc fixes from Andrew Morton:
"14 fixes and one selftest to verify the ipc fixes herein"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: limit boost_watermark on small zones
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
epoll: atomically remove wait entry on wake up
kselftests: introduce new epoll60 testcase for catching lost wakeups
percpu: make pcpu_alloc() aware of current gfp context
mm/slub: fix incorrect interpretation of s->offset
scripts/gdb: repair rb_first() and rb_last()
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
scripts/decodecode: fix trapping instruction formatting
kernel/kcov.c: fix typos in kcov_remote_start documentation
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
mm, memcg: fix error return value of mem_cgroup_css_alloc()
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
When I added the expected error testing, I forgot I need to set
the return to zero when we successfully see an error.
Without this change we only end up testing a single heap
before the test quits.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Brian Starkey <brian.starkey@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Andrew F. Davis" <afd@ti.com>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM. Instead,
DR6.BD should be set in order to catch this case.
On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted. However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation for UBSAN_ALIGNMENT already mentions that it should
not be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).
Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc9 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This test case catches lost wake up introduced by commit 339ddb53d3
("fs/epoll: remove unnecessary wakeups of nested epoll")
The test is simple: we have 10 threads and 10 event fds. Each thread
can harvest only 1 event. 1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.
In case of lost wakeup epoll_wait() will timeout and 0 will be returned.
Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:
5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")
the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups. Problem is fixed in the
following patch.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This Kselftest update for Linux 5.7-rc5 consists of ftrace test fixes
and fix to kvm Makefile for relocatable native/cross builds and installs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl60SCYACgkQCwJExA0N
QxzlTQ/+LWXBGbHJPPOT2UPjoa+f5JQgYePxTVbnnG+nG4SQcbHMBGMORwbz7S8e
o/dc2H+jhfYNnTc2L/JoruJVT7E2CwehBKTZzqCu6AbBWdPNgQCmxFIh5Gos8JMb
cj9J/7CEtnOAQF7IArGYlyd9gblRc8ACwvgVLRJbKcNFHV0EBRV2Kg5JOE9RaKPK
xIA2GNb30cA3P4IFykc199zrOsfwf2Dr0Bf1wf3HkO1UnhrFIDsT7PRNBLGWWB+n
c9Mi9fFmLEJr3yDL6B87Mt7+sUGRb7Jvvz6h8gIhJaE8IpbkD9nq0PhuuQNawFy4
KS/FYFtRShKU/m+g+Nudg9PJR+vzpL/wR6kp/5Z1W5Us9ba200GZBtFT0x+NUP3n
fNczO61iwAi4qesfCpKgDfGco754mw1+hUZAdRJkYY6tyZmjrPVpVKgOR8kN4qjP
anUl1trQsDM5kaScG/6RmBhoAtaU91Mdhcw1/8YLBpoQUQEMxWtwlXdvKkJ3zjfE
Gs0jBoThuCHAj3IMZmL3dPaa88IA904aelTW6OLnq8BmHEW2938Y/T1L47bHO5C+
WZnFVzNgEqFTLHqX/MYzmoJ1y+A09fA3Owi6OmsvJ5ZY+LjEiKGXuBHj+VhbwJ4G
COphD9CcIVx0gQ3Zct8feajloQcGu5v4/El3/3kPQkuZjzhxRq0=
=jUcW
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"ftrace test fixes and a fix to kvm Makefile for relocatable
native/cross builds and installs"
* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix kvm relocatable native/cross builds and installs
selftests/ftrace: Make XFAIL green color
ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
ftrace/selftests: workaround cgroup RT scheduling issues
fixes.2020.04.27a: Miscellaneous fixes.
kfree_rcu.2020.04.27a: Changes related to kfree_rcu().
rcu-tasks.2020.04.27a: Addition of new RCU-tasks flavors.
stall.2020.04.27a: RCU CPU stall-warning updates.
torture.2020.05.07a: Torture-test updates.
Make it a bit easier to apply KASAN to rcutorture runs with a new --kasan
argument, again leveraging the config_override_param() bash function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit saves a few lines of code by also using the bash
config_override_param() to set the initial list of Kconfig options from
the CFcommon file. While in the area, it makes this function capable of
update-in-place on the file containing the cumulative Kconfig options,
thus avoiding annoying changes when adding another source of options.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit applies config_override_param() to allow scenario-specific
Kconfig options to override those in CFcommon. This in turn will allow
additional Kconfig options to be placed in CFcommon, for example, an
option common to all but a few scenario can be placed in CFcommon and
then overridden in those few scenarios. Plus this change saves one
whole line of code.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, attempting to override a --kcsan default with a --kconfig
option might or might not work. However, it would be good to allow the
user to adjust the --kcsan defaults, for example, to specify a different
time for CONFIG_KCSAN_REPORT_ONCE_IN_MS. This commit therefore uses the
new config_override_param() bash function to apply the --kcsan defaults
and then apply the --kconfig options, which allows this overriding
to occur.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit introduces a config_override_param() bash function that
folds in an additional set of Kconfig options. This is initially applied
to fold in the --kconfig kvm.sh parameter, but later commits will also
apply it to the Kconfig options added by the --kcsan kvm.sh parameter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The #CHECK# directives that can be present in CFcommon and in the
rcutorture scenario Kconfig files are both copied to ConfigFragment
and grepped out of the two directive files and added to ConfigFragment.
This commit therefore removes the redundant "grep" commands and takes
advantage of the consequent opportunity to simplify redirection.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The KCSAN tool emits a great many warnings for current kernels, for
example, a one-hour run of the full set of rcutorture scenarios results
in no fewer than 3252 such warnings, many of which are duplicates
or are otherwise closely related. This commit therefore introduces
a kcsan-collapse.sh script that maps these warnings down to a set of
function pairs (22 of them given the 3252 individual warnings), placing
the resulting list in decreasing order of frequency of occurrence into
a kcsan.sum file. If any KCSAN warnings were produced, the pathname of
this file is emitted at the end of the summary of the rcutorture runs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Although the existing --kconfig argument can be used to run KCSAN for
an rcutorture test, it is not as straightforward as one might like:
--kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y \
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n \
CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n \
CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 \
CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"
This commit therefore adds a "--kcsan" argument that emulates the above
--kconfig command. Note that if you specify a Kconfig option using
-kconfig that conflicts with one that --kcsan adds, you get whatever
the script and the build system decide to give you.
Cc: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The number of CPUs is tuned to allow "4*CFLIST TREE10" on a large system,
up from "3*CFLIST TREE10" previously.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6yqbIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroObBQf+NH9DCs6X92YggAoNpJl6uSIOX35X
ErdWqYj80Xx95QU73aMukjs3Zqxe6WfYI9jPEOD8SDUZzZlVfIA35D8BYlqt1c5R
A2K2ebTQbZ+j487QTUPbEvEivyxyVSozwvOdKBfL5kv0D9Cn2STyjVjmguUoCp9n
VztmwbwpSZdOnexRSolwAWuyOriYbvpV12cIZpcMGrjL67yZPv8UyCxxJplDCLlB
1c8tvGI2Md8apE/YZDqlCFh3H4YBQsact8uOoyY8cXKO/xIAsZOI+Dhm/cQAhGDk
QIQqv/hkM4HPvOXQluwIau4Cx+Fl05xY/ggtQt4z/8yml2pOw8PKmwziZA==
=60QX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.
Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...
Covers fundamental tests for KVM_SET_GUEST_DEBUG. It is very close to the debug
test in kvm-unit-test, but doing it from outside the guest.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200505205000.188252-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from David Miller:
1) Fix reference count leaks in various parts of batman-adv, from Xiyu
Yang.
2) Update NAT checksum even when it is zero, from Guillaume Nault.
3) sk_psock reference count leak in tls code, also from Xiyu Yang.
4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
fq_codel, from Eric Dumazet.
5) Fix panic in choke_reset(), also from Eric Dumazet.
6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.
7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.
8) Fix crash in x25_disconnect(), from Yue Haibing.
9) Don't pass pointer to local variable back to the caller in
nf_osf_hdr_ctx_init(), from Arnd Bergmann.
10) Wireguard should use the ECN decap helper functions, from Toke
Høiland-Jørgensen.
11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.
12) Fix uninitialized variable access in mptcp's
subflow_syn_recv_sock(), from Paolo Abeni.
13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
Scott Dial.
14) IPv6 needs to use a global serial number for dst validation just
like ipv4, from David Ahern.
15) Fix up PTP_1588_CLOCK deps, from Clay McClure.
16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
Yoshiyuki Kurauchi.
17) Fix a regression in that dsa user port errors should not be fatal,
from Florian Fainelli.
18) Fix iomap leak in enetc driver, from Dejin Zheng.
19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.
20) Initialize protocol value earlier in neigh code paths when
generating events, from Roman Mashak.
21) netdev_update_features() must be called with RTNL mutex in macsec
driver, from Antoine Tenart.
22) Validate untrusted GSO packets even more strictly, from Willem de
Bruijn.
23) Wireguard decrypt worker needs a cond_resched(), from Jason
Donenfeld.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
wireguard: send/receive: cond_resched() when processing worker ringbuffers
wireguard: socket: remove errant restriction on looping to self
wireguard: selftests: use normal kernel stack size on ppc64
net: ethernet: ti: am65-cpsw-nuss: fix irqs type
ionic: Use debugfs_create_bool() to export bool
net: dsa: Do not leave DSA master with NULL netdev_ops
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
net: stricter validation of untrusted gso packets
seg6: fix SRH processing to comply with RFC8754
net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
net: dsa: ocelot: the MAC table on Felix is twice as large
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
net: hsr: fix incorrect type usage for protocol variable
net: macsec: fix rtnl locking issue
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
...
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.
At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since chunk_size is no longer an integer, we can not
use it directly as an argument of setsockopt().
This patch should fix tcp_mmap for Big Endian kernels.
Fixes: 597b01edaf ("selftests: net: avoid ptl lock contention in tcp_mmap")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We added fields in tcp_zerocopy_receive structure,
so make sure to clear all fields to not pass garbage to the kernel.
We were lucky because recent additions added 'out' parameters,
still we need to clean our reference implementation, before folks
copy/paste it.
Fixes: c8856c0514 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Fixes: 33946518d4 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I got this error when building kvm selftests:
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here
I think it's because evmcs.h is included both in a test file and a lib file so
the structs have multiple declarations when linking. After all it's not a good
habit to declare structs in the header files.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200504220607.99627-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently users have to choose a free snapshot id before
calling DEVLINK_CMD_REGION_NEW. This is potentially racy
and inconvenient.
Make the DEVLINK_ATTR_REGION_SNAPSHOT_ID optional and try
to allocate id automatically. Send a message back to the
caller with the snapshot info.
Example use:
$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 1
$ id=$(devlink -j region new netdevsim/netdevsim1/dummy | \
jq '.[][][][]')
$ devlink region dump netdevsim/netdevsim1/dummy snapshot $id
[...]
$ devlink region del netdevsim/netdevsim1/dummy snapshot $id
v4:
- inline the notification code
v3:
- send the notification only once snapshot creation completed.
v2:
- don't wrap the line containing extack;
- add a few sentences to the docs.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-01 (v2)
The following pull-request contains BPF updates for your *net-next* tree.
We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).
The main changes are:
1) pulled work.sysctl from vfs tree with sysctl bpf changes.
2) bpf_link observability, from Andrii.
3) BTF-defined map in map, from Andrii.
4) asan fixes for selftests, from Andrii.
5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.
6) production cloudflare classifier as a selftes, from Lorenz.
7) bpf_ktime_get_*_ns() helper improvements, from Maciej.
8) unprivileged bpftool feature probe, from Quentin.
9) BPF_ENABLE_STATS command, from Song.
10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.
11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey pointed out that we can use reno instead of dctcp for CC
tests and drop CONFIG_TCP_CONG_DCTCP=y requirement.
Fixes: beecf11bc2 ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr")
Suggested-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the
'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program.
Let's generalize them and make them available for 'struct bpf_sock_addr'.
That way, in the future, we can allow those helpers in more places.
As an example, let's expose those 'struct bpf_sock_addr' based helpers to
BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the
connection is made.
v3:
* Expose custom helpers for bpf_sock_addr context instead of doing
generic bpf_sock argument (as suggested by Daniel). Even with
try_socket_lock that doesn't sleep we have a problem where context sk
is already locked and socket lock is non-nestable.
v2:
* s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
kvm test Makefile doesn't fully support cross-builds and installs.
UNAME_M = $(shell uname -m) variable is used to define the target
programs and libraries to be built from arch specific sources in
sub-directories.
For cross-builds to work, UNAME_M has to map to ARCH and arch specific
directories and targets in this Makefile.
UNAME_M variable to used to run the compiles pointing to the right arch
directories and build the right targets for these supported architectures.
TEST_GEN_PROGS and LIBKVM are set using UNAME_M variable.
LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable.
x86_64 targets are named to include x86_64 as a suffix and directories
for includes are in x86_64 sub-directory. s390x and aarch64 follow the
same convention. "uname -m" doesn't result in the correct mapping for
s390x and aarch64. Fix it to set UNAME_M correctly for s390x and aarch64
cross-builds.
In addition, Makefile doesn't create arch sub-directories in the case of
relocatable builds and test programs under s390x and x86_64 directories
fail to build. This is a problem for native and cross-builds. Fix it to
create all necessary directories keying off of TEST_GEN_PROGS.
The following use-cases work with this change:
Native x86_64:
make O=/tmp/kselftest -C tools/testing/selftests TARGETS=kvm install \
INSTALL_PATH=$HOME/x86_64
arm64 cross-build:
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- defconfig
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- all
make kselftest-install TARGETS=kvm O=$HOME/arm64_build ARCH=arm64 \
HOSTCC=gcc CROSS_COMPILE=aarch64-linux-gnu-
s390x cross-build:
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
CROSS_COMPILE=s390x-linux-gnu- defconfig
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
CROSS_COMPILE=s390x-linux-gnu- all
make kselftest-install TARGETS=kvm O=$HOME/s390x_build/ ARCH=s390 \
HOSTCC=gcc CROSS_COMPILE=s390x-linux-gnu- all
No regressions in the following use-cases:
make -C tools/testing/selftests TARGETS=kvm
make kselftest-all TARGETS=kvm
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since XFAIL (Expected Failure) is expected to fail the test, which
means that test case works as we expected. IOW, XFAIL is same as
PASS. So make it green.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, ftracetest will return 1 (failure) if any unresolved cases
are encountered. The unresolved status results from modules and
programs not being available, and as such does not indicate any
issues with ftrace itself. As such, change the behaviour of
ftracetest in line with unsupported cases; if unsupported cases
happen, ftracetest still returns 0 unless --fail-unsupported. Here
--fail-unresolved is added and the default is to return 0 if
unresolved results occur.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
wakeup_rt.tc and wakeup.tc tests in tracers/ subdirectory
fail due to the chrt command returning:
chrt: failed to set pid 0's policy: Operation not permitted.
To work around this, temporarily disable grout RT scheduling
during ftracetest execution. Restore original value on
test run completion. With these changes in place, both
tests consistently pass.
Fixes: c575dea2c1 ("selftests/ftrace: Add wakeup_rt tracer testcase")
Fixes: c1edd060b4 ("selftests/ftrace: Add wakeup tracer testcase")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This kselftest update for Linux 5.7-rc4 consists of:
- ftrace test fixes to check for required filter files and kprobe args.
- Kselftest build/cross-build dependency check script to make it easier
for test ring admins/users to configure build systems correctly for
build/cross-build kselftests. Currently checks library dependencies.
- Checks if Kselftests can be built/cross-built on a system running
compile test on a trivial C file with LDLIBS specified for each
individual test in their Makefiles.
- Prints suggested target list for a system filtering out tests
failed the build dependency check from the TARGETS in Selftests
the main Makefile when optional -p is specified.
- Prints pass/fail dependency check for each tests/sub-test.
- Prints pass/fail targets and libraries.
- Default: runs dependency checks on all tests.
- Optional test name can be specified to check dependencies for it.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6rC9sACgkQCwJExA0N
QxwOqBAAuDgdAMKgks2ei8gc1A3NEeOCXmq5NtCeM5H80o+v5hALY4XeEHUAYpdI
/RC4XJ5sK8l/9AMhwoTxHdOjxGWGb7mznsNjUwqEicwpVxzpMRkcCfUavJdIWw3v
WkICPFuz5qj2ixdTvrYl1a3sHZr0yXAhoroI+Nbl4N2fhB80T9aZeE/mI6tgc4af
7pfSR6C4S+e0VocQeqof8uAMuUI4hJXg5nkSRLNLI6RHoTh4dE2Ozxhp+1KaW2P2
JivGD7mnqEn91r0Ai14uhvGQqJBFdLWhjqVhuSFf8I1cHBbTHu7ZEG+uARKYd8lP
SKF/AOkWxduBVQCY03gcynlhSjP6wN4/XIg/HMcS7KTyEQHB4nNavQY9XiHB2ARI
6f/1cKwA2VJMVbSlCMfC+E3/sIG05ht0XwiMhGapDy+B8pWKRiwsKn2GptG7R5th
b3y78Mkz4jK9OzC5rWY/tzOH9UVdI8sxSa3KXr1X2UDxX21R/XNzEVquj+zTEawo
jx63We3qLpzoh7d0K7hOCpAdZa6x7HpYGgc4ntUAttQLiALg7R+hEyltnsEd5WUs
w1hncEwcCGDw4QzIpOAvqlePZcqi+3C6ezo8bHhVJ/X5B6ikzzaB1pUiqpeeJfEj
J9Pus0+QQUdf19wJQwOjYcrxt8fmKRLPTHPQOAxOWzEnpFeGYzE=
=/gZD
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- ftrace test fixes to check for required filter files and kprobe args.
- Kselftest build/cross-build dependency check script to make it easier
for test ring admins/users to configure build systems correctly for
build/cross-build kselftests. Currently checks library dependencies.
- Checks if Kselftests can be built/cross-built on a system running
compile test on a trivial C file with LDLIBS specified for each
individual test in their Makefiles.
- Prints suggested target list for a system filtering out tests
failed the build dependency check from the TARGETS in Selftests
the main Makefile when optional -p is specified.
- Prints pass/fail dependency check for each tests/sub-test.
- Prints pass/fail targets and libraries.
- Default: runs dependency checks on all tests.
- Optional test name can be specified to check dependencies for it.
* tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Check the first record for kprobe_args_type.tc
selftests: add build/cross-build dependency check script
selftests/ftrace: Check required filter files before running test
As agreed with Boris, merge in the 'x86/asm' branch from -tip so that we
can select the new 'ARCH_USE_SYM_ANNOTATIONS' Kconfig symbol, which is
required by the BTI kernel patches.
* 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Provide a Kconfig symbol for disabling old assembly annotations
x86/32: Remove CONFIG_DOUBLEFAULT
Check that verifier allows passing a map of type:
BPF_MAP_TYPE_REUSEPORT_SOCKARRARY, or
BPF_MAP_TYPE_SOCKMAP, or
BPF_MAP_TYPE_SOCKHASH
... to bpf_sk_select_reuseport helper.
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200430104738.494180-1-jakub@cloudflare.com
Update bpf_sk_assign test to fetch the server socket from SOCKMAP, now that
map lookup from BPF in SOCKMAP is enabled. This way the test TC BPF program
doesn't need to know what address server socket is bound to.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429181154.479310-4-jakub@cloudflare.com
Now that bpf_map_lookup_elem() is white-listed for SOCKMAP/SOCKHASH,
replace the tests which check that verifier prevents lookup on these map
types with ones that ensure that lookup operation is permitted, but only
with a release of acquired socket reference.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429181154.479310-3-jakub@cloudflare.com
With recent changes, runqslower is being copied into selftests/bpf root
directory. So add it into .gitignore.
Fixes: b26d1e2b60 ("selftests/bpf: Copy runqslower to OUTPUT directory")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Veronika Kabatova <vkabatov@redhat.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-12-andriin@fb.com
If condition is inverted, but it's also just not necessary.
Fixes: 1c1052e014 ("tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Carlos Neira <cneirabustos@gmail.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-11-andriin@fb.com
AddressSanitizer assumes that all memory dereferences are done against memory
allocated by sanitizer's malloc()/free() code and not touched by anyone else.
Seems like this doesn't hold for perf buffer memory. Disable instrumentation
on perf buffer callback function.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-10-andriin@fb.com
Another one found by AddressSanitizer. input_len is bigger than actually
initialized data size.
Fixes: c7566a6969 ("selftests/bpf: Add field existence CO-RE relocs tests")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-8-andriin@fb.com
getline() allocates string, which has to be freed.
Fixes: 81f77fd0de ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-7-andriin@fb.com
Free test selector substrings, which were strdup()'ed.
Fixes: b65053cd94 ("selftests/bpf: Add whitelist/blacklist of test names to test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-6-andriin@fb.com
Add ability to specify extra compiler flags with SAN_CFLAGS for compilation of
all user-space C files. This allows to build all of selftest programs with,
e.g., custom sanitizer flags, without requiring support for such sanitizers
from anyone compiling selftest/bpf.
As an example, to compile everything with AddressSanitizer, one would do:
$ make clean && make SAN_CFLAGS="-fsanitize=address"
For AddressSanitizer to work, one needs appropriate libasan shared library
installed in the system, with version of libasan matching what GCC links
against. E.g., GCC8 needs libasan5, while GCC7 uses libasan4.
For CentOS 7, to build everything successfully one would need to:
$ sudo yum install devtoolset-8-gcc devtoolset-libasan-devel
$ scl enable devtoolset-8 bash # set up environment
For Arch Linux to run selftests, one would need to install gcc-libs package to
get libasan.so.5:
$ sudo pacman -S gcc-libs
N.B. EXTRA_CFLAGS name wasn't used, because it's also used by libbpf's
Makefile and this causes few issues:
1. default "-g -Wall" flags are overriden;
2. compiling shared library with AddressSanitizer generates a bunch of symbols
like: "_GLOBAL__sub_D_00099_0_btf_dump.c", "_GLOBAL__sub_D_00099_0_bpf.c",
etc, which screws up versioned symbols check.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Julia Kartseva <hex@fb.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-3-andriin@fb.com
Ensure that test runner flavors include their own skeletons from <flavor>/
directory. Previously, skeletons generated for no-flavor test_progs were used.
Apart from fixing correctness, this also makes it possible to compile only
flavors individually:
$ make clean && make test_progs-no_alu32
... now succeeds ...
Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-2-andriin@fb.com
As discussed at LPC 2019 ([0]), this patch brings (a quite belated) support
for declarative BTF-defined map-in-map support in libbpf. It allows to define
ARRAY_OF_MAPS and HASH_OF_MAPS BPF maps without any user-space initialization
code involved.
Additionally, it allows to initialize outer map's slots with references to
respective inner maps at load time, also completely declaratively.
Despite a weak type system of C, the way BTF-defined map-in-map definition
works, it's actually quite hard to accidentally initialize outer map with
incompatible inner maps. This being C, of course, it's still possible, but
even that would be caught at load time and error returned with helpful debug
log pointing exactly to the slot that failed to be initialized.
As an example, here's a rather advanced HASH_OF_MAPS declaration and
initialization example, filling slots #0 and #4 with two inner maps:
#include <bpf/bpf_helpers.h>
struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, int);
__type(value, int);
} inner_map1 SEC(".maps"),
inner_map2 SEC(".maps");
struct outer_hash {
__uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
__uint(max_entries, 5);
__uint(key_size, sizeof(int));
__array(values, struct inner_map);
} outer_hash SEC(".maps") = {
.values = {
[0] = &inner_map2,
[4] = &inner_map1,
},
};
Here's the relevant part of libbpf debug log showing pretty clearly of what's
going on with map-in-map initialization:
libbpf: .maps relo #0: for 6 value 0 rel.r_offset 96 name 260 ('inner_map1')
libbpf: .maps relo #0: map 'outer_arr' slot [0] points to map 'inner_map1'
libbpf: .maps relo #1: for 7 value 32 rel.r_offset 112 name 249 ('inner_map2')
libbpf: .maps relo #1: map 'outer_arr' slot [2] points to map 'inner_map2'
libbpf: .maps relo #2: for 7 value 32 rel.r_offset 144 name 249 ('inner_map2')
libbpf: .maps relo #2: map 'outer_hash' slot [0] points to map 'inner_map2'
libbpf: .maps relo #3: for 6 value 0 rel.r_offset 176 name 260 ('inner_map1')
libbpf: .maps relo #3: map 'outer_hash' slot [4] points to map 'inner_map1'
libbpf: map 'inner_map1': created successfully, fd=4
libbpf: map 'inner_map2': created successfully, fd=5
libbpf: map 'outer_hash': created successfully, fd=7
libbpf: map 'outer_hash': slot [0] set to map 'inner_map2' fd=5
libbpf: map 'outer_hash': slot [4] set to map 'inner_map1' fd=4
Notice from the log above that fd=6 (not logged explicitly) is used for inner
"prototype" map, necessary for creation of outer map. It is destroyed
immediately after outer map is created.
See also included selftest with some extra comments explaining extra details
of usage. Additionally, similar initialization syntax and libbpf functionality
can be used to do initialization of BPF_PROG_ARRAY with references to BPF
sub-programs. This can be done in follow up patches, if there will be a demand
for this.
[0] https://linuxplumbersconf.org/event/4/contributions/448/
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200429002739.48006-4-andriin@fb.com
Similar to commit b7a0d65d80 ("bpf, testing: Workaround a verifier failure for test_progs")
fix test_sysctl_prog.c as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
TLS 1.2 and TLS 1.3 differ in the implementation.
Use fixture parameters to run all tests for both
versions, and remove the one-off TLS 1.2 test.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow users to build parameterized variants of fixtures.
If fixtures want variants, they call FIXTURE_VARIANT() to declare
the structure to fill for each variant. Each fixture will be re-run
for each of the variants defined by calling FIXTURE_VARIANT_ADD()
with the differing parameters initializing the structure.
Since tests are being re-run, additional initialization (steps,
no_print) is also added.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all tests have a fixture object move from a global
list of tests to a list of tests per fixture.
Order of tests may change as we will now group and run test
fixture by fixture, rather than in declaration order.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grouping tests by fixture will allow us to parametrize
test runs. Create full objects for fixtures.
Add a "global" fixture for tests without a fixture.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees suggest to factor out the list append code to a macro,
since following commits need it, which leads to code duplication.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
New tests to check route dump and notifications with
net.ipv4.nexthop_compat_mode on and off.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
$(OUTPUT)/runqslower makefile target doesn't actually create runqslower
binary in the $(OUTPUT) directory. As lib.mk expects all
TEST_GEN_PROGS_EXTENDED (which runqslower is a part of) to be present in
the OUTPUT directory, this results in an error when running e.g. `make
install`:
rsync: link_stat "tools/testing/selftests/bpf/runqslower" failed: No
such file or directory (2)
Copy the binary into the OUTPUT directory after building it to fix the
error.
Fixes: 3a0d3092a4 ("selftests/bpf: Build runqslower from selftests")
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200428173742.2988395-1-vkabatov@redhat.com
Add test for matchall classifier with mirred egress mirror action.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
System hangs or killed rcutorture guest OSes can result in truncated
"Reader Pipe:" lines, which can in turn result in false-positive
reader-batch near-miss warnings. This commit therefore adjusts the
reader-batch checks to account for possible line truncation.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a TRACE02 scenario which enables preemption and RCU
Tasks Trace IPIs, more specifically, disabling heavyweight readers.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
cls_redirect is a TC clsact based replacement for the glb-redirect iptables
module available at [1]. It enables what GitHub calls "second chance"
flows [2], similarly proposed by the Beamer paper [3]. In contrast to
glb-redirect, it also supports migrating UDP flows as long as connected
sockets are used. cls_redirect is in production at Cloudflare, as part of
our own L4 load balancer.
We have modified the encapsulation format slightly from glb-redirect:
glbgue_chained_routing.private_data_type has been repurposed to form a
version field and several flags. Both have been arranged in a way that
a private_data_type value of zero matches the current glb-redirect
behaviour. This means that cls_redirect will understand packets in
glb-redirect format, but not vice versa.
The test suite only covers basic features. For example, cls_redirect will
correctly forward path MTU discovery packets, but this is not exercised.
It is also possible to switch the encapsulation format to GRE on the last
hop, which is also not tested.
There are two major distinctions from glb-redirect: first, cls_redirect
relies on receiving encapsulated packets directly from a router. This is
because we don't have access to the neighbour tables from BPF, yet. See
forward_to_next_hop for details. Second, cls_redirect performs decapsulation
instead of using separate ipip and sit tunnel devices. This
avoids issues with the sit tunnel [4] and makes deploying the classifier
easier: decapsulated packets appear on the same interface, so existing
firewall rules continue to work as expected.
The code base started it's life on v4.19, so there are most likely still
hold overs from old workarounds. In no particular order:
- The function buf_off is required to defeat a clang optimization
that leads to the verifier rejecting the program due to pointer
arithmetic in the wrong order.
- The function pkt_parse_ipv6 is force inlined, because it would
otherwise be rejected due to returning a pointer to stack memory.
- The functions fill_tuple and classify_tcp contain kludges, because
we've run out of function arguments.
- The logic in general is rather nested, due to verifier restrictions.
I think this is either because the verifier loses track of constants
on the stack, or because it can't track enum like variables.
1: https://github.com/github/glb-director/tree/master/src/glb-redirect
2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md
3: https://www.usenix.org/conference/nsdi18/presentation/olteanu
4: https://github.com/github/glb-director/issues/64
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200424185556.7358-2-lmb@cloudflare.com
To make BPF verifier verbose log more releavant and easier to use to debug
verification failures, "pop" parts of log that were successfully verified.
This has effect of leaving only verifier logs that correspond to code branches
that lead to verification failure, which in practice should result in much
shorter and more relevant verifier log dumps. This behavior is made the
default behavior and can be overriden to do exhaustive logging by specifying
BPF_LOG_LEVEL2 log level.
Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some
cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump
verbosity, but still have only relevant verifier branches logged. But for this
patch, I didn't want to add any new flags. It might be worth-while to just
rethink how BPF verifier logging is performed and requested and streamline it
a bit. But this trimming of successfully verified branches seems to be useful
and a good default behavior.
To test this, I modified runqslower slightly to introduce read of
uninitialized stack variable. Log (**truncated in the middle** to save many
lines out of this commit message) BEFORE this change:
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
[ ... instruction dump from insn #7 through #50 are cut out ... ]
51: (b7) r2 = 16
52: (85) call bpf_get_current_comm#16
last_idx 52 first_idx 42
regs=4 stack=0 before 51: (b7) r2 = 16
; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
53: (bf) r1 = r6
54: (18) r2 = 0xffff8881f3868800
56: (18) r3 = 0xffffffff
58: (bf) r4 = r7
59: (b7) r5 = 32
60: (85) call bpf_perf_event_output#25
last_idx 60 first_idx 53
regs=20 stack=0 before 59: (b7) r5 = 32
61: (bf) r2 = r10
; event.pid = pid;
62: (07) r2 += -16
; bpf_map_delete_elem(&start, &pid);
63: (18) r1 = 0xffff8881f3868000
65: (85) call bpf_map_delete_elem#3
; }
66: (b7) r0 = 0
67: (95) exit
from 44 to 66: safe
from 34 to 66: safe
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881f3868000
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how there is a successful code path from instruction 0 through 67, few
successfully verified jumps (44->66, 34->66), and only after that 11->28 jump
plus error on instruction #32.
AFTER this change (full verifier log, **no truncation**):
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
7: (79) r2 = *(u64 *)(r1 +16)
; if (prev->state == TASK_RUNNING)
8: (55) if r2 != 0x0 goto pc+19
R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; trace_enqueue(prev->tgid, prev->pid);
9: (61) r1 = *(u32 *)(r1 +1184)
10: (63) *(u32 *)(r10 -4) = r1
; if (!pid || (targ_pid && targ_pid != pid))
11: (15) if r1 == 0x0 goto pc+16
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881db3ce800
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how in this case, there are 0-11 instructions + jump from 11 to
28 is recorded + 28-32 instructions with error on insn #32.
test_verifier test runner was updated to specify BPF_LOG_LEVEL2 for
VERBOSE_ACCEPT expected result due to potentially "incomplete" success verbose
log at BPF_LOG_LEVEL1.
On success, verbose log will only have a summary of number of processed
instructions, etc, but no branch tracing log. Having just a last succesful
branch tracing seemed weird and confusing. Having small and clean summary log
in success case seems quite logical and nice, though.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200423195850.1259827-1-andriin@fb.com
Currently the following prog types don't fall back to bpf_base_func_proto()
(instead they have cgroup_base_func_proto which has a limited set of
helpers from bpf_base_func_proto):
* BPF_PROG_TYPE_CGROUP_DEVICE
* BPF_PROG_TYPE_CGROUP_SYSCTL
* BPF_PROG_TYPE_CGROUP_SOCKOPT
I don't see any specific reason why we shouldn't use bpf_base_func_proto(),
every other type of program (except bpf-lirc and, understandably, tracing)
use it, so let's fall back to bpf_base_func_proto for those prog types
as well.
This basically boils down to adding access to the following helpers:
* BPF_FUNC_get_prandom_u32
* BPF_FUNC_get_smp_processor_id
* BPF_FUNC_get_numa_node_id
* BPF_FUNC_tail_call
* BPF_FUNC_ktime_get_ns
* BPF_FUNC_spin_lock (CAP_SYS_ADMIN)
* BPF_FUNC_spin_unlock (CAP_SYS_ADMIN)
* BPF_FUNC_jiffies64 (CAP_SYS_ADMIN)
I've also added bpf_perf_event_output() because it's really handy for
logging and debugging.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200420174610.77494-1-sdf@google.com
Pull networking fixes from David Miller:
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
Alexei Starovoitov says:
====================
pull-request: bpf 2020-04-24
The following pull-request contains BPF updates for your *net* tree.
We've added 17 non-merge commits during the last 5 day(s) which contain
a total of 19 files changed, 203 insertions(+), 85 deletions(-).
The main changes are:
1) link_update fix, from Andrii.
2) libbpf get_xdp_id fix, from David.
3) xadd verifier fix, from Jann.
4) x86-32 JIT fixes, from Luke and Wang.
5) test_btf fix, from Stanislav.
6) freplace verifier fix, from Toke.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new selftest that tests the ability to attach an freplace
program to a program type that relies on the expected_attach_type of the
target program to pass verification.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158773526831.293902.16011743438619684815.stgit@toke.dk
It is possible to get multiple records from trace during test and then more
than 4 arguments are assigned to ARGS. This situation results in the failure
of kprobe_args_type.tc. For example:
-----------------------------------------------------------
grep testprobe trace
ftracetest-5902 [001] d... 111195.682227: testprobe: (_do_fork+0x0/0x460) arg1=334823024 arg2=334823024 arg3=0x13f4fe70 arg4=7
pmlogger-5949 [000] d... 111195.709898: testprobe: (_do_fork+0x0/0x460) arg1=345308784 arg2=345308784 arg3=0x1494fe70 arg4=7
grep testprobe trace
sed -e 's/.* arg1=\(.*\) arg2=\(.*\) arg3=\(.*\) arg4=\(.*\)/\1 \2 \3 \4/'
ARGS='334823024 334823024 0x13f4fe70 7
345308784 345308784 0x1494fe70 7'
-----------------------------------------------------------
We don't care which process calls do_fork so just check the first record to
fix the issue.
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add build/cross-build dependency check script kselftest_deps.sh
This script does the following:
Usage: ./kselftest_deps.sh -[p] <compiler> [test_name]
kselftest_deps.sh [-p] gcc
kselftest_deps.sh [-p] gcc vm
kselftest_deps.sh [-p] aarch64-linux-gnu-gcc
kselftest_deps.sh [-p] aarch64-linux-gnu-gcc vm
- Should be run in selftests directory in the kernel repo.
- Checks if Kselftests can be built/cross-built on a system.
- Parses all test/sub-test Makefile to find library dependencies.
- Runs compile test on a trivial C file with LDLIBS specified
in the test Makefiles to identify missing library dependencies.
- Prints suggested target list for a system filtering out tests
failed the build dependency check from the TARGETS in Selftests
the main Makefile when optional -p is specified.
- Prints pass/fail dependency check for each tests/sub-test.
- Prints pass/fail targets and libraries.
- Default: runs dependency checks on all tests.
- Optional test name can be specified to check dependencies for it.
To make LDLIBS parsing easier
- change gpio and memfd Makefiles to use the same temporary variable used
to find and add libraries to LDLIBS.
- simlify LDLIBS append logic in intel_pstate/Makefile.
Results from run on x86_64 system (trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh gcc ] results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 53 Fail: 2
--------------------------------------------------------
Targets passed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc safesetid timens timers vDSO vm
--------------------------------------------------------
FAIL: netfilter/Makefile dependency check: -lmnl
FAIL: gpio/Makefile dependency check: -lmount
--------------------------------------------------------
Targets failed build dependency check on system:
gpio netfilter
--------------------------------------------------------
Missing libraries system
-lmnl -lmount
--------------------------------------------------------
========================================================
Results from run on x86_64 system with aarch64-linux-gnu-gcc:
(trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh aarch64-linux-gnu-gcc ]
results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 41 Fail: 14
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc timens timers vDSO vm
--------------------------------------------------------
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities gpio memfd mqueue net netfilter safesetid vm
--------------------------------------------------------
Missing libraries system
-lcap -lcap-ng -lelf -lfuse -lmnl -lmount -lnuma -lpopt -lz
--------------------------------------------------------
========================================================
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Without CONFIG_DYNAMIC_FTRACE, some tests get failure because required
filter files(set_ftrace_filter/available_filter_functions/stack_trace_filter)
are missing. So implement check_filter_file() and make all related tests
check required filter files by it.
BTW: set_ftrace_filter and available_filter_functions are introduced together
so just check either of them.
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add nodad when adding IPv6 addresses and remove the sleep.
A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.
Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a self-test for the IPv6 dsfield munge that iproute2 will support.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_tests is spewing errors:
...
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
ping: connect: Network is unreachable
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
...
Each test entry in fib_tests is supposed to do its own setup and
cleanup. Right now the $IP commands in fib_suppress_test are
failing because there is no ns1. Add the setup/cleanup and logging
expected for each test.
Fixes: ca7a03c417 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This kselftest update for Linux 5.7-rc3 consists of fixes to runner
scripts and individual test run-time bugs. Includes fixes to tpm2
and memfd test run-time regressions.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6gcfsACgkQCwJExA0N
QxxmFQ/+NCoDIatTBqn6sFWC17RuwYJQDkcuOF1NM45IXUGt0URUIxRbmM0DlDJ6
Y3HUCFjXdTTzS4cC/8lIFIH7Gf4NTFswviODC9qfeE9zi4BXwG9CyJZpOsPhlo5T
aZsjTVgJfeN4EgjXKn/9MWCPE1u3pgMPUfZRSsnQyEjT0tgGYChq/rADc8Y1Mgax
9GnOf9QoZ21gZhWRJVxwPVDDik4OU6GhmILee2GULtl2O/tecnsbSisnQKS/WHoD
qCnNfxuaZAlpO7Ma+ywtxURuXspE9nWnsJmckCQ018WvpN26LymyRmAsDul0R2oU
nTk3YhiMo8TB6nFq1BEE+ti3oi7dN3DndAM25++PjxMyFKh4UVvaWw6V38ITVvNd
e5zS57CmYEA6gOdN05xf0Uk/mSfdbsdIPGsiql49YoFjrXNzaWxEE+kkgxul+wht
GjBcJwxDPwfL7GQ+SgcxANffIwn2N4/welPAW2dc5Nhx5mGZZMBEfKZWhq/vwXn4
mmmr2oX3V9tj0Hlo8g8jgSRknRFwMOjlyvtNdjDOrqDsumPDueWUuJMX0rGiDJU/
W8w3mjrI5WdPbH7qKG/xWARhKxVhSkSh+TtNjz5Rj4oQ2A70/mxlCp9R+PmZlEqJ
OjorLur3J88J4MdSSiQJvInJPQ5Bd80v0oz/ZXlQVs1S6zzQHyk=
=6l5V
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This consists of fixes to runner scripts and individual test run-time
bugs. Includes fixes to tpm2 and memfd test run-time regressions"
* tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ipc: Fix test failure seen after initial test run
Revert "Kernel selftests: tpm2: check for tpm support"
selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig
selftests/seccomp: allow clock_nanosleep instead of nanosleep
kselftest/runner: allow to properly deliver signals to tests
selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM"
selftests: Fix memfd test run-time regression
selftests: vm: Fix 64-bit test builds for powerpc64le
selftests: vm: Do not override definition of ARCH
The hidepid parameter values are becoming more and more and it becomes
difficult to remember what each new magic number means.
Backward compatibility is preserved since it is possible to specify
numerical value for the hidepid parameter. This does not break the
fsconfig since it is not possible to specify a numerical value through
it. All numeric values are converted to a string. The type
FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.
Selftest has been added to verify this behavior.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This patch allows to have multiple procfs instances inside the
same pid namespace. The aim here is lightweight sandboxes, and to allow
that we have to modernize procfs internals.
1) The main aim of this work is to have on embedded systems one
supervisor for apps. Right now we have some lightweight sandbox support,
however if we create pid namespacess we have to manages all the
processes inside too, where our goal is to be able to run a bunch of
apps each one inside its own mount namespace without being able to
notice each other. We only want to use mount namespaces, and we want
procfs to behave more like a real mount point.
2) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.
This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.
By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which users can not.
Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change LSMs should be able to analyze
open/read/write/close...
In the new patch set version I removed the 'newinstance' option
as suggested by Eric W. Biederman.
Selftest has been added to verify new behavior.
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
On AMD, the state of the VMCB is undefined after a shutdown VMEXIT. KVM
takes a very conservative approach to that and resets the guest altogether
when that happens. This causes the set_memory_region_test to fail
because the RIP is 0xfff0 (the reset vector). Restrict the RIP test
to KVM_EXIT_INTERNAL_ERROR in order to fix this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Include a README file with the instructions to use the
testcases at selftests/powerpc/nx-gzip.
Signed-off-by: Bulent Abali <abali@us.ibm.com>
Signed-off-by: Raphael Moreira Zinsly <rzinsly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200420205538.25181-6-rzinsly@linux.ibm.com
Include a decompression testcase for the powerpc NX-GZIP
engine.
Signed-off-by: Bulent Abali <abali@us.ibm.com>
Signed-off-by: Raphael Moreira Zinsly <rzinsly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200420205538.25181-5-rzinsly@linux.ibm.com
Add files to be able to compress and decompress files using the
powerpc NX-GZIP engine.
Signed-off-by: Bulent Abali <abali@us.ibm.com>
Signed-off-by: Raphael Moreira Zinsly <rzinsly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200420205538.25181-3-rzinsly@linux.ibm.com
This patch adds a test to test_verifier that writes the lower 8 bits of
R10 (aka FP) using BPF_B to an array map and reads the result back. The
expected behavior is that the result should be the same as first copying
R10 to R9, and then storing / loading the lower 8 bits of R9.
This test catches a bug that was present in the x86-64 JIT that caused
an incorrect encoding for BPF_STX BPF_B when the source operand is R10.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200418232655.23870-2-luke.r.nels@gmail.com
When check_xadd() verifies an XADD operation on a pointer to a stack slot
containing a spilled pointer, check_stack_read() verifies that the read,
which is part of XADD, is valid. However, since the placeholder value -1 is
passed as `value_regno`, check_stack_read() can only return a binary
decision and can't return the type of the value that was read. The intent
here is to verify whether the value read from the stack slot may be used as
a SCALAR_VALUE; but since check_stack_read() doesn't check the type, and
the type information is lost when check_stack_read() returns, this is not
enforced, and a malicious user can abuse XADD to leak spilled kernel
pointers.
Fix it by letting check_stack_read() verify that the value is usable as a
SCALAR_VALUE if no type information is passed to the caller.
To be able to use __is_pointer_value() in check_stack_read(), move it up.
Fix up the expected unprivileged error message for a BPF selftest that,
until now, assumed that unprivileged users can use XADD on stack-spilled
pointers. This also gives us a test for the behavior introduced in this
patch for free.
In theory, this could also be fixed by forbidding XADD on stack spills
entirely, since XADD is a locked operation (for operations on memory with
concurrency) and there can't be any concurrency on the BPF stack; but
Alexei has said that he wants to keep XADD on stack slots working to avoid
changes to the test suite [1].
The following BPF program demonstrates how to leak a BPF map pointer as an
unprivileged user using this bug:
// r7 = map_pointer
BPF_LD_MAP_FD(BPF_REG_7, small_map),
// r8 = launder(map_pointer)
BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_7, -8),
BPF_MOV64_IMM(BPF_REG_1, 0),
((struct bpf_insn) {
.code = BPF_STX | BPF_DW | BPF_XADD,
.dst_reg = BPF_REG_FP,
.src_reg = BPF_REG_1,
.off = -8
}),
BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_FP, -8),
// store r8 into map
BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_7),
BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -4),
BPF_ST_MEM(BPF_W, BPF_REG_ARG2, 0, 0),
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_8, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN()
[1] https://lore.kernel.org/bpf/20200416211116.qxqcza5vo2ddnkdq@ast-mbp.dhcp.thefacebook.com/
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200417000007.10734-1-jannh@google.com
Add PMTU discovery tests for these encapsulations:
- IPIP
- SIT, mode ip6ip
- ip6tnl, modes ip6ip6 and ipip6
Signed-off-by: Lourdes Pedrajas <lu@pplo.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.
2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.
3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.
4) Adjust icmp6 message source address selection when routes have a
preferred source address set, from Tim Stallard.
5) Be sure to validate HSR protocol version when creating new links,
from Taehee Yoo.
6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
non-initial namespaces, from Michael Weiß.
7) Missing release firmware call in mlx5, from Eran Ben Elisha.
8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
Taehee Yoo.
9) Fix pause frame negotiation in marvell phy driver, from Clemens
Gruber.
10) Record RX queue early enough in tun packet paths such that XDP
programs will see the correct RX queue index, from Gilberto Bertin.
11) Fix double unlock in mptcp, from Florian Westphal.
12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.
13) marvell10g needs to soft reset PHY when coming out of low power
mode, from Russell King.
14) Fix MTU setting regression in stmmac for some chip types, from
Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
amd-xgbe: Use __napi_schedule() in BH context
mISDN: make dmril and dmrim static
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
tipc: fix incorrect increasing of link window
Documentation: Fix tcp_challenge_ack_limit default value
net: tulip: make early_486_chipsets static
dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
ipv6: remove redundant assignment to variable err
net/rds: Use ERR_PTR for rds_message_alloc_sgs()
net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
selftests/bpf: Check for correct program attach/detach in xdp_attach test
libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
libbpf: Always specify expected_attach_type on program load if supported
xsk: Add missing check on user supplied headroom size
mac80211: fix channel switch trigger from unknown mesh peer
mac80211: fix race in ieee80211_register_hw()
net: marvell10g: soft-reset the PHY when coming out of low power
net: marvell10g: report firmware version
net/cxgb4: Check the return from t4_query_params properly
...
This patch introduces test_add_max_memory_regions(), which checks
that a VM can have added memory slots up to the limit defined in
KVM_CAP_NR_MEMSLOTS. Then attempt to add one more slot to
verify it fails as expected.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-11-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make set_memory_region_test available on all architectures by wrapping
the bits that are x86-specific in ifdefs. A future testcase
to create the maximum number of memslots will be architecture
agnostic.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-10-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a testcase for running a guest with no memslots to the memory region
test. The expected result on x86_64 is that the guest will trigger an
internal KVM error due to the initial code fetch encountering a
non-existent memslot and resulting in an emulation failure.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-9-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduces the vm_get_fd() function in kvm_util which returns
the VM file descriptor.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a testcase for deleting memslots while the guest is running.
Like the "move" testcase, this is x86_64-only as it relies on MMIO
happening when a non-existent memslot is encountered.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use sem_post() and sem_timedwait() to synchronize test stages between
the vCPU thread and the main thread instead of using usleep() to wait
for the vCPU thread and hoping for the best.
Opportunistically refactor the code to make it suck less in general,
and to prepare for adding more testcases.
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add variants of GUEST_ASSERT to pass values back to the host, e.g. to
help debug/understand a failure when the the cause of the assert isn't
necessarily binary.
It'd probably be possible to auto-calculate the number of arguments and
just have a single GUEST_ASSERT, but there are a limited number of
variants and silently eating arguments could lead to subtle code bugs.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a utility to delete a memory region, it will be used by x86's
set_memory_region_test.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20200410231707.7128-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the KVM selftests' homebrewed linked lists for vCPUs and memory
regions with the kernel's 'struct list_head'.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20200410231707.7128-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The sole caller of vm_vcpu_rm() already has the vcpu pointer, take it
directly instead of doing an extra lookup.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20200410231707.7128-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Ahern noticed that there was a bug in the EXPECTED_FD code so
programs did not get detached properly when that parameter was supplied.
This case was not included in the xdp_attach tests; so let's add it to be
sure that such a bug does not sneak back in down.
Fixes: 87854a0b57 ("selftests/bpf: Add tests for attaching XDP programs")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200414145025.182163-2-toke@redhat.com
For some types of BPF programs that utilize expected_attach_type, libbpf won't
set load_attr.expected_attach_type, even if expected_attach_type is known from
section definition. This was done to preserve backwards compatibility with old
kernels that didn't recognize expected_attach_type attribute yet (which was
added in 5e43f899b0 ("bpf: Check attach type at prog load time"). But this
is problematic for some BPF programs that utilize newer features that require
kernel to know specific expected_attach_type (e.g., extended set of return
codes for cgroup_skb/egress programs).
This patch makes libbpf specify expected_attach_type by default, but also
detect support for this field in kernel and not set it during program load.
This allows to have a good metadata for bpf_program
(e.g., bpf_program__get_extected_attach_type()), but still work with old
kernels (for cases where it can work at all).
Additionally, due to expected_attach_type being always set for recognized
program types, bpf_program__attach_cgroup doesn't have to do extra checks to
determine correct attach type, so remove that additional logic.
Also adjust section_names selftest to account for this change.
More detailed discussion can be found in [0].
[0] https://lore.kernel.org/bpf/20200412003604.GA15986@rdna-mbp.dhcp.thefacebook.com/
Fixes: 5cf1e91456 ("bpf: cgroup inet skb programs can return 0 to 3")
Fixes: 5e43f899b0 ("bpf: Check attach type at prog load time")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20200414182645.1368174-1-andriin@fb.com
Test that frozen and mmap()'ed BPF map can't be mprotect()'ed as writable or
executable memory. Also validate that "downgrading" from writable to read-only
doesn't screw up internal writable count accounting for the purposes of map
freezing.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200410202613.3679837-2-andriin@fb.com
After successfully running the IPC msgque test once, subsequent runs
result in a test failure:
$ sudo ./run_kselftest.sh
TAP version 13
1..1
# selftests: ipc: msgque
# Failed to get stats for IPC queue with id 0
# Failed to dump queue: -22
# Bail out!
# # Pass 0 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
not ok 1 selftests: ipc: msgque # exit=1
The dump_queue() function loops through the possible message queue index
values using calls to msgctl(kern_id, MSG_STAT, ...) where kern_id
represents the index value. The first time the test is ran, the initial
index value of 0 is valid and the test is able to complete. The index
value of 0 is not valid in subsequent test runs and the loop attempts to
try index values of 1, 2, 3, and so on until a valid index value is
found that corresponds to the message queue created earlier in the test.
The msgctl() syscall returns -1 and sets errno to EINVAL when invalid
index values are used. The test failure is caused by incorrectly
comparing errno to -EINVAL when cycling through possible index values.
Fix invalid test failures on subsequent runs of the msgque test by
correctly comparing errno values to a non-negated EINVAL.
Fixes: 3a665531a3 ("selftests: IPC message queue copy feature test")
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This reverts commit b32694cd07.
The original comment was neither reviewed nor tested. Thus, this the
*only* possible action to take.
Cc: Nikita Sobolev <Nikita.Sobolev@synopsys.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
ftrace-direct.tc and kprobe-direct.tc require CONFIG_SAMPLE_FTRACE_DIRECT=m
so add it to config file which is used by merge_config.sh.
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
glibc 2.31 calls clock_nanosleep when its nanosleep function is used. So
the restart_syscall fails after that. In order to deal with it, we trace
clock_nanosleep and nanosleep. Then we check for either.
This works just fine on systems with both glibc 2.30 and glibc 2.31,
whereas it failed before on a system with glibc 2.31.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
While running seccomp_bpf, kill_after_ptrace() gets stuck if we run it
via /usr/bin/timeout (that is the default), until the timeout expires.
This is because /usr/bin/timeout is preventing to properly deliver
signals to ptrace'd children (SIGSYS in this case).
This problem can be easily reproduced by running:
$ sudo make TARGETS=seccomp kselftest
...
# [ RUN ] TRACE_syscall.skip_a#
not ok 1 selftests: seccomp: seccomp_bpf # TIMEOUT
The test is hanging at this point until the timeout expires and then it
reports the timeout error.
Prevent this problem by passing --foreground to /usr/bin/timeout,
allowing to properly deliver signals to children processes.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There a few identical spelling mistakes, fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Make the doublefault exception handler unconditional on 32-bit. Yes,
it is important to be able to catch #DF exceptions instead of silent
reboots. Yes, the code size increase is worth every byte. And one less
CONFIG symbol is just the cherry on top.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200404083646.8897-1-bp@alien8.de
Commit d3fd949abd ("selftests: Fix memfd to support relocatable
build (O=objdir)") introduced regression run-time regression with
a change to include programs that should be run from shell scripts
to list of programs that run as independent tests. This fix restores
the original designation.
Fixes: d3fd949abd ("selftests: Fix memfd to support relocatable build (O=objdir)")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Some tests are built only for 64-bit systems. This makes
sure that these tests are built for both big and little
endian variants of powerpc64.
Fixes: 7549b33642 ("selftests: vm: Build/Run 64bit tests only on 64bit arch")
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Independent builds of the vm selftests is currently broken because
commit 7549b33642 ("selftests: vm: Build/Run 64bit tests only on
64bit arch") overrides the value of ARCH with the machine name from
uname. This does not always match the architecture names used for
tasks like header installation.
E.g. for building tests on powerpc64, we need ARCH=powerpc
and not ARCH=ppc64 or ARCH=ppc64le. Otherwise, the build
fails as shown below.
$ uname -m
ppc64le
$ make -C tools/testing/selftests/vm
make: Entering directory '/home/sandipan/linux/tools/testing/selftests/vm'
make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install
make[1]: Entering directory '/home/sandipan/linux'
Makefile:653: arch/ppc64le/Makefile: No such file or directory
make[1]: *** No rule to make target 'arch/ppc64le/Makefile'. Stop.
make[1]: Leaving directory '/home/sandipan/linux'
../lib.mk:50: recipe for target 'khdr' failed
make: *** [khdr] Error 2
make: Leaving directory '/home/sandipan/linux/tools/testing/selftests/vm'
Fixes: 7549b33642 ("selftests: vm: Build/Run 64bit tests only on 64bit arch")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Test that request_module() fails with -ENOENT when
/proc/sys/kernel/modprobe contains (a) a nonexistent path, and (b) an
empty path.
Case (b) is a regression test for the patch "kmod: make request_module()
return an error when autoloading is disabled".
Tested with 'kmod.sh -t 0010 && kmod.sh -t 0011', and also simply with
'kmod.sh' to run all kmod tests.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: NeilBrown <neilb@suse.com>
Link: http://lkml.kernel.org/r/20200312202552.241885-5-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_test_count() and get_test_enabled() were broken for test numbers
above 9 due to awk interpreting a field specification like '$0010' as
octal rather than decimal. Fix it by stripping the leading zeroes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: NeilBrown <neilb@suse.com>
Link: http://lkml.kernel.org/r/20200318230515.171692-5-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-04-10
The following pull-request contains BPF updates for your *net* tree.
We've added 13 non-merge commits during the last 7 day(s) which contain
a total of 13 files changed, 137 insertions(+), 43 deletions(-).
The main changes are:
1) JIT code emission fixes for riscv and arm32, from Luke Nelson and Xi Wang.
2) Disable vmlinux BTF info if GCC_PLUGIN_RANDSTRUCT is used, from Slava Bacherikov.
3) Fix oob write in AF_XDP when meta data is used, from Li RongQing.
4) Fix bpf_get_link_xdp_id() handling on single prog when flags are specified,
from Andrey Ignatov.
5) Fix sk_assign() BPF helper for request sockets that can have sk_reuseport
field uninitialized, from Joe Stringer.
6) Fix mprotect() test case for the BPF LSM, from KP Singh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- A fix for a crash in machine check handling on pseries (ie. guests)
- A small series to make it possible to disable CONFIG_COMPAT, and turn it off
by default for ppc64le where it's not used.
- A few other miscellaneous fixes and small improvements.
Thanks to:
Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan
Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar,
Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6O8LgTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMcYEACbGf+Z9brLSasYajoqU6QdqGPacHEN
1a9TEmUnN+HWgtfkkoEBFbyuYHnhyhYuf7hvNccDjDA91ESVylO+Wq7Q+v/xoz29
LVyb3V6uuVMLHnoqwP5jpr0lS0aOpuu3Nc2SpfBuolDtJqeMxpVEEK3Ln3uATlq6
SnEAxQEKb2x4Y4Tfuq5A3txupj0s/UYrmeR6GAdkN3Oapbb9uQl8Ql2smqrZo0cq
6TLxtoFzbfXkV6NmY2V0OKMTeXt0fhrvdDhFEBckCUpRZLv4Fd7CwPWNER2bUVs6
04kg87BAO8qRyfr3G93oP0mWgi65kpXI8yN6Vt5Lig+5PnxNh7swvEqDpQR1s9se
uVoeN7RBHOKQppZRkpzf6yZbelCcMuwTeIBQ9XOx/jYFAHJ2nUkJm9TYgKckVvNY
E4shiM1eoOVMrEmODFBCUmUkJLyn5jU1+r5mn708v2Nb5E1XgoTejitB6bHyL+Aa
zo/0DdsZO86iNE7th94oHQRgVbx1vtP9kV6vK6BLB5M95RSGEdVAEMo5CTL70wr+
hz7suOaijPi+TeCW9YPrcOHcHOQE9pzTFQoVZHuv8egbvd9Rni3aBAMMqHx/lQdE
Hk4zN7IytOLqQTfINNYgoo0kUKI1ipJQOBfR5jyeLhgg6yp36CnO/m+0QGvrILbi
18tlf2qkD75Z3g==
=72sb
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"The bulk of this is the series to make CONFIG_COMPAT user-selectable,
it's been around for a long time but was blocked behind the
syscall-in-C series.
Plus there's also a few fixes and other minor things.
Summary:
- A fix for a crash in machine check handling on pseries (ie. guests)
- A small series to make it possible to disable CONFIG_COMPAT, and
turn it off by default for ppc64le where it's not used.
- A few other miscellaneous fixes and small improvements.
Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
Nicholas Piggin, Stephen Boyd, Wen Xiong"
* tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Always build the tm-poison test 64-bit
powerpc: Improve ppc_save_regs()
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
powerpc/perf: split callchain.c by bitness
powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
powerpc/64: make buildable without CONFIG_COMPAT
powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
powerpc/perf: consolidate read_user_stack_32
powerpc: move common register copy functions from signal_32.c to signal.c
powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
powerpc/ps3: Remove an unneeded NULL check
powerpc/ps3: Remove duplicate error message
powerpc/powernv: Re-enable imc trace-mode in kernel
powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
powerpc/pseries: Fix MCE handling on pseries
selftests/eeh: Skip ahci adapters
powerpc/64s: Fix doorbell wakeup msgclr optimisation
In set_operation_mode() function remove duplicated check for args.list
parameter, which is already done one line before.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add xdp_info selftest that makes sure that bpf_get_link_xdp_id returns
valid prog_id for different input modes:
* w/ and w/o flags when no program is attached;
* w/ and w/o flags when one program is attached.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/2a9a6d1ce33b91ccc1aa3de6dba2d309f2062811.1586236080.git.rdna@fb.com
This testcase repeats epollbug.c from the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=205933
What it tests? It tests the race between epoll_ctl() and epoll_wait().
New event mask passed to epoll_ctl() triggers wake up, which can be missed
because of the bug described in the link. Reproduction is 100%, so easy
to fix. Kudos, Max, for wonderful test case.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Link: http://lkml.kernel.org/r/20200214170211.561524-2-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add uffd tests for write protection.
Instead of introducing new tests for it, let's simply squashing uffd-wp
tests into existing uffd-missing test cases. Changes are:
(1) Bouncing tests
We do the write-protection in two ways during the bouncing test:
- By using UFFDIO_COPY_MODE_WP when resolving MISSING pages: then
we'll make sure for each bounce process every single page will be
at least fault twice: once for MISSING, once for WP.
- By direct call UFFDIO_WRITEPROTECT on existing faulted memories:
To further torture the explicit page protection procedures of
uffd-wp, we split each bounce procedure into two halves (in the
background thread): the first half will be MISSING+WP for each
page as explained above. After the first half, we write protect
the faulted region in the background thread to make sure at least
half of the pages will be write protected again which is the first
half to test the new UFFDIO_WRITEPROTECT call. Then we continue
with the 2nd half, which will contain both MISSING and WP faulting
tests for the 2nd half and WP-only faults from the 1st half.
(2) Event/Signal test
Mostly previous tests but will do MISSING+WP for each page. For
sigbus-mode test we'll need to provide standalone path to handle the
write protection faults.
For all tests, do statistics as well for uffd-wp pages.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-20-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce uffd_stats structure for statistics of the self test, at the
same time refactor the code to always pass in the uffd_stats for either
read() or poll() typed fault handling threads instead of using two
different ways to return the statistic results. No functional change.
With the new structure, it's very easy to introduce new statistics.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-19-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A large series from Nick for 64-bit to further rework our exception vectors,
and rewrite portions of the syscall entry/exit and interrupt return in C. The
result is much easier to follow code that is also faster in general.
- Cleanup of our ptrace code to split various parts out that had become badly
intertwined with #ifdefs over the years.
- Changes to our NUMA setup under the PowerVM hypervisor which should
hopefully avoid non-sensical topologies which can lead to warnings from the
workqueue code and other problems.
- MAINTAINERS updates to remove some of our old orphan entries and update the
status of others.
- Quite a few other small changes and fixes all over the map.
Thanks to:
Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen Zhou, Christophe JAILLET,
Christophe Leroy, Christoph Hellwig, Clement Courbet, Daniel Axtens, David
Gibson, Douglas Miller, Fabiano Rosas, Fangrui Song, Ganesh Goudar, Gautham R.
Shenoy, Greg Kroah-Hartman, Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie
Halip, Jan Kara, Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger,
Laurentiu Tudor, Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira,
Michael Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers,
Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat, Rasmus Villemoes, Ravi
Bangoria, Roman Bolshakov, Sam Bobroff, Sandipan Das, Santosh S, Sedat Dilek,
Segher Boessenkool, Shilpasri G Bhat, Sourabh Jain, Srikar Dronamraju, Stephen
Rothwell, Tyrel Datwyler, Vaibhav Jain, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6JypATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOTyD/0U90tXb3VXlQcc4OFIb8vWIj76k4Zn
ZSZ7RyOuvb5pCISBZjSK79XkR9eMHT77qagX4V41q64k4yQl8nbgLeVnwL76hLLc
IJCs23f4nsO0uqX/MhSCc5dfOOOS2i8V+OQYtsYWsH5QaG95v0cHIqVaHHMlfQxu
507GO/W5W6KTd4x008b5unQOuE51zMKlKvqEJXkT59obQFpaa2S5Wn7OzhsnarCH
YSRNxaC7vtgBKLA9wUnFh8UUbh0FbOwXBCaq4OhHMhgRihdteVBCzlcR/6c+IRbt
EoZxKzfQ0hI1z5f++kJNaRXMtUbSpM8D1HdKKHgiWjpdBSD0eu2X106KQT2R2ZOF
qhX8xPLWNzdBglA6L43AaZUu+4ayd3QrrJIkjDv/K1rCHZjfGOzSQfoZgTEBNLFA
tC0crhEfw8m98e4EwhCtekGQxdczRdLS9YvtC/h6mU2xkpA35yNSwB1/iuVQdkYD
XyrEqImAQ1PJla7NL0hxSy5ZxrBtMeKT4WZZ0BNgKXryemldg8Tuv3AEyach3BHz
eU0pIwpbnPm1JAPyrpDQ1yEf7QsD77gTPfEvilEci60R9DhvIMGAY+pt0qfME3yX
wOLp2yVBEXlRmvHk/y/+r+m4aCsmwSrikbWwmLLwAAA6JehtzFOWxTEfNpACP23V
mZyyZznsHIIE3Q==
=ARdm
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Slightly late as I had to rebase mid-week to insert a bug fix:
- A large series from Nick for 64-bit to further rework our exception
vectors, and rewrite portions of the syscall entry/exit and
interrupt return in C. The result is much easier to follow code
that is also faster in general.
- Cleanup of our ptrace code to split various parts out that had
become badly intertwined with #ifdefs over the years.
- Changes to our NUMA setup under the PowerVM hypervisor which should
hopefully avoid non-sensical topologies which can lead to warnings
from the workqueue code and other problems.
- MAINTAINERS updates to remove some of our old orphan entries and
update the status of others.
- Quite a few other small changes and fixes all over the map.
Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew
Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen
Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement
Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas,
Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman,
Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara,
Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor,
Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar,
Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael
Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick
Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat,
Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff,
Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G
Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel
Datwyler, Vaibhav Jain, YueHaibing"
* tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
powerpc: Make setjmp/longjmp signature standard
powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type
powerpc: Suppress .eh_frame generation
powerpc: Drop -fno-dwarf2-cfi-asm
powerpc/32: drop unused ISA_DMA_THRESHOLD
powerpc/powernv: Add documentation for the opal sensor_groups sysfs interfaces
selftests/powerpc: Fix try-run when source tree is not writable
powerpc/vmlinux.lds: Explicitly retain .gnu.hash
powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
powerpc/ptrace: create ppc_gethwdinfo()
powerpc/ptrace: create ptrace_get_debugreg()
powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
powerpc/ptrace: move register viewing functions out of ptrace.c
powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.
powerpc/ptrace: split out SPE related functions.
powerpc/ptrace: split out ALTIVEC related functions.
powerpc/ptrace: split out VSX related functions.
powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET
powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64
powerpc/ptrace: remove unused header includes
...
- The ring buffer is no longer disabled when reading the trace file.
The trace_pipe file was made to be used for live tracing and reading
as it acted like the normal producer/consumer. As the trace file
would not consume the data, the easy way of handling it was to just
disable writes to the ring buffer. This came to a surprise to the
BPF folks who complained about lost events due to reading.
This is no longer an issue. If someone wants to keep the old disabling
there's a new option "pause-on-trace" that can be set.
- New set_ftrace_notrace_pid file. PIDs in this file will not be traced
by the function tracer. Similar to set_ftrace_pid, which makes the
function tracer only trace those tasks with PIDs in the file, the
set_ftrace_notrace_pid does the reverse.
- New set_event_notrace_pid file. PIDs in this file will cause events
not to be traced if triggered by a task with a matching PID.
Similar to the set_event_pid file but will not be traced.
Note, sched_waking and sched_switch events may still be trace if
one of the tasks referenced by those events contains a PID that
is allowed to be traced.
Tracing related features:
- New bootconfig option, that is attached to the initrd file.
If bootconfig is on the command line, then the initrd file
is searched looking for a bootconfig appended at the end.
- New GPU tracepoint infrastructure to help the gfx drivers to get
off debugfs (acked by Greg Kroah-Hartman)
Other minor updates and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXokgWRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qgrHAP0UkKs/52JY4oWa3OIh/OqK+vnCrIwz
zGvDFOYM0fKbwgD9FZWgzlcaYK5m2Cxlhp4VoraZveHMLJUhnEHtdX6X0wk=
=Rebj
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New tracing features:
- The ring buffer is no longer disabled when reading the trace file.
The trace_pipe file was made to be used for live tracing and
reading as it acted like the normal producer/consumer. As the trace
file would not consume the data, the easy way of handling it was to
just disable writes to the ring buffer.
This came to a surprise to the BPF folks who complained about lost
events due to reading. This is no longer an issue. If someone wants
to keep the old disabling there's a new option "pause-on-trace"
that can be set.
- New set_ftrace_notrace_pid file. PIDs in this file will not be
traced by the function tracer.
Similar to set_ftrace_pid, which makes the function tracer only
trace those tasks with PIDs in the file, the set_ftrace_notrace_pid
does the reverse.
- New set_event_notrace_pid file. PIDs in this file will cause events
not to be traced if triggered by a task with a matching PID.
Similar to the set_event_pid file but will not be traced. Note,
sched_waking and sched_switch events may still be traced if one of
the tasks referenced by those events contains a PID that is allowed
to be traced.
Tracing related features:
- New bootconfig option, that is attached to the initrd file.
If bootconfig is on the command line, then the initrd file is
searched looking for a bootconfig appended at the end.
- New GPU tracepoint infrastructure to help the gfx drivers to get
off debugfs (acked by Greg Kroah-Hartman)
And other minor updates and fixes"
* tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Do not allocate buffer in trace_find_next_entry() in atomic
tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid
selftests/ftrace: Add test to test new set_event_notrace_pid file
selftests/ftrace: Add test to test new set_ftrace_notrace_pid file
tracing: Create set_event_notrace_pid to not trace tasks
ftrace: Create set_ftrace_notrace_pid to not trace tasks
ftrace: Make function trace pid filtering a bit more exact
ftrace/kprobe: Show the maxactive number on kprobe_events
tracing: Have the document reflect that the trace file keeps tracing enabled
ring-buffer/tracing: Have iterator acknowledge dropped events
tracing: Do not disable tracing when reading the trace file
ring-buffer: Do not disable recording when there is an iterator
ring-buffer: Make resize disable per cpu buffer instead of total buffer
ring-buffer: Optimize rb_iter_head_event()
ring-buffer: Do not die if rb_iter_peek() fails more than thrice
ring-buffer: Have rb_iter_head_event() handle concurrent writer
ring-buffer: Add page_stamp to iterator for synchronization
ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
ring-buffer: Have ring_buffer_empty() not depend on tracing stopped
tracing: Save off entry when peeking at next entry
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXoSOqgAKCRCRxhvAZXjc
omfTAQDlbSZkzNfEWcxi+WE9tg+Y5tHR7hjdsK99Mb3vhAh5GwD+LT0EOI3WKoNY
MQUp2RaR7YwWL/pktd9N1EX8qa5IPg4=
=zvBX
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread updates from Christian Brauner:
"The main change for this cycle was the extension for clone3() to
support spawning processes directly into cgroups via CLONE_INTO_CGROUP
(commit ef2c41cf38a7: "clone3: allow spawning processes
into cgroups").
But since I had to touch kernel/cgroup/ quite a bit I had Tejun route
that through his tree this time around to make it easier for him to
handle other changes.
So here is just the unexciting leftovers: a regression test for the
ENOMEM regression we fixed in commit b26ebfe12f ("pid: Fix error
return value in some cases") verifying that we report ENOMEM when
trying to create a new process in a pid namespace whose init
process/subreaper has already exited"
* tag 'threads-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add pid namespace ENOMEM regression test
The tm-poison test includes inline asm which is 64-bit only, so the
test must be built 64-bit in order to work.
Otherwise it fails, eg:
# file tm-poison
tm-poison: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV) ...
# ./tm-poison
test: tm_poison_test
Unknown value 0x1fff71150 leaked into f31!
Unknown value 0x1fff710c0 leaked into vr31!
failure: tm_poison_test
Fixes: a003365cab ("powerpc/tm: Add tm-poison test")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200403095656.3772005-1-mpe@ellerman.id.au
Here is the big set of char/misc/other driver patches for 5.7-rc1.
Lots of things in here, and it's later than expected due to some reverts
to resolve some reported issues. All is now clean with no reported
problems in linux-next.
Included in here is:
- interconnect updates
- mei driver updates
- uio updates
- nvmem driver updates
- soundwire updates
- binderfs updates
- coresight updates
- habanalabs updates
- mhi new bus type and core
- extcon driver updates
- some Kconfig cleanups
- other small misc driver cleanups and updates
As mentioned, all have been in linux-next for a while, and with the last
two reverts, all is calm and good.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodfvA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynzCQCfROhar3E8EhYEqSOP6xq6uhX9uegAnRgGY2rs
rN4JJpOcTddvZcVlD+vo
=ocWk
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char/misc/other driver patches for 5.7-rc1.
Lots of things in here, and it's later than expected due to some
reverts to resolve some reported issues. All is now clean with no
reported problems in linux-next.
Included in here is:
- interconnect updates
- mei driver updates
- uio updates
- nvmem driver updates
- soundwire updates
- binderfs updates
- coresight updates
- habanalabs updates
- mhi new bus type and core
- extcon driver updates
- some Kconfig cleanups
- other small misc driver cleanups and updates
As mentioned, all have been in linux-next for a while, and with the
last two reverts, all is calm and good"
* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
Revert "driver core: platform: Initialize dma_parms for platform devices"
Revert "amba: Initialize dma_parms for amba devices"
amba: Initialize dma_parms for amba devices
driver core: platform: Initialize dma_parms for platform devices
bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
bus: mhi: core: Add support for reading MHI info from device
misc: rtsx: set correct pcr_ops for rts522A
speakup: misc: Use dynamic minor numbers for speakup devices
mei: me: add cedar fork device ids
coresight: do not use the BIT() macro in the UAPI header
Documentation: provide IBM contacts for embargoed hardware
nvmem: core: remove nvmem_sysfs_get_groups()
nvmem: core: use is_bin_visible for permissions
nvmem: core: use device_register and device_unregister
nvmem: core: add root_only member to nvmem device struct
extcon: axp288: Add wakeup support
extcon: Mark extcon_get_edev_name() function as exported symbol
extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
...
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Pull cgroup updates from Tejun Heo:
- Christian extended clone3 so that processes can be spawned into
cgroups directly.
This is not only neat in terms of semantics but also avoids grabbing
the global cgroup_threadgroup_rwsem for migration.
- Daniel added !root xattr support to cgroupfs.
Userland already uses xattrs on cgroupfs for bookkeeping. This will
allow delegated cgroups to support such usages.
- Prateek tried to make cpuset hotplug handling synchronous but that
led to possible deadlock scenarios. Reverted.
- Other minor changes including release_agent_path handling cleanup.
* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Document the cpuset_v2_mode mount option
Revert "cpuset: Make cpuset hotplug synchronous"
cgroupfs: Support user xattrs
kernfs: Add option to enable user xattrs
kernfs: Add removed_size out param for simple_xattr_set
kernfs: kvmalloc xattr value instead of kmalloc
cgroup: Restructure release_agent_path handling
selftests/cgroup: add tests for cloning into cgroups
clone3: allow spawning processes into cgroups
cgroup: add cgroup_may_write() helper
cgroup: refactor fork helpers
cgroup: add cgroup_get_from_file() helper
cgroup: unify attach permission checking
cpuset: Make cpuset hotplug synchronous
cgroup.c: Use built-in RCU list checking
kselftest/cgroup: add cgroup destruction test
cgroup: Clean up css_set task traversal
The test was previously using an mprotect on the heap memory allocated
using malloc and was expecting the allocation to be always using
sbrk(2). This is, however, not always true and in certain conditions
malloc may end up using anonymous mmaps for heap alloctions. This means
that the following condition that is used in the "lsm/file_mprotect"
program is not sufficent to detect all mprotect calls done on heap
memory:
is_heap = (vma->vm_start >= vma->vm_mm->start_brk &&
vma->vm_end <= vma->vm_mm->brk);
The test is updated to use an mprotect on memory allocated on the stack.
While this would result in the splitting of the vma, this happens only
after the security_file_mprotect hook. So, the condition used in the BPF
program holds true.
Fixes: 03e54f100d ("bpf: lsm: Add selftests for BPF_PROG_TYPE_LSM")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200402200751.26372-1-kpsingh@chromium.org
* GICv4.1 support
* 32bit host removal
PPC:
* secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
* allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
* New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require bulk
modification of the page tables.
* Initial work on making nested SVM event injection more similar to VMX,
and less buggy.
* Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in function
names which occasionally means eptp, KVM too has standardized on "pgd".
* A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
* Some removal of pointer chasing from kvm_x86_ops, which will also be
switched to static calls as soon as they are available.
* New Tigerlake CPUID features.
* More bugfixes, optimizations and cleanups.
Generic:
* selftests: cleanups, new MMU notifier stress test, steal-time test
* CSV output for kvm_stat.
KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
by MIPS maintainers. I had already prepared a fix, but the MIPS maintainers
prefer to fix it in generic code rather than KVM so they are taking care of it.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
=6fGt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
Merge updates from Andrew Morton:
"A large amount of MM, plenty more to come.
Subsystems affected by this patch series:
- tools
- kthread
- kbuild
- scripts
- ocfs2
- vfs
- mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
hugetlbfs, hugetlb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
selftests/vm: fix map_hugetlb length used for testing read and write
mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
mm/hugetlb.c: clean code by removing unnecessary initialization
hugetlb_cgroup: add hugetlb_cgroup reservation docs
hugetlb_cgroup: add hugetlb_cgroup reservation tests
hugetlb: support file_region coalescing again
hugetlb_cgroup: support noreserve mappings
hugetlb_cgroup: add accounting for shared mappings
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add reservation accounting for private mappings
mm/hugetlb_cgroup: fix hugetlb_cgroup migration
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
hugetlb_cgroup: add hugetlb_cgroup reservation counter
hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
mm/memblock.c: remove redundant assignment to variable max_addr
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
...
Pull exec/proc updates from Eric Biederman:
"This contains two significant pieces of work: the work to sort out
proc_flush_task, and the work to solve a deadlock between strace and
exec.
Fixing proc_flush_task so that it no longer requires a persistent
mount makes improvements to proc possible. The removal of the
persistent mount solves an old regression that that caused the hidepid
mount option to only work on remount not on mount. The regression was
found and reported by the Android folks. This further allows Alexey
Gladkov's work making proc mount options specific to an individual
mount of proc to move forward.
The work on exec starts solving a long standing issue with exec that
it takes mutexes of blocking userspace applications, which makes exec
extremely deadlock prone. For the moment this adds a second mutex with
a narrower scope that handles all of the easy cases. Which makes the
tricky cases easy to spot. With a little luck the code to solve those
deadlocks will be ready by next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
signal: Extend exec_id to 64bits
pidfd: Use new infrastructure to fix deadlocks in execve
perf: Use new infrastructure to fix deadlocks in execve
proc: io_accounting: Use new infrastructure to fix deadlocks in execve
proc: Use new infrastructure to fix deadlocks in execve
kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
kernel: doc: remove outdated comment cred.c
mm: docs: Fix a comment in process_vm_rw_core
selftests/ptrace: add test cases for dead-locks
exec: Fix a deadlock in strace
exec: Add exec_update_mutex to replace cred_guard_mutex
exec: Move exec_mmap right after de_thread in flush_old_exec
exec: Move cleanup of posix timers on exec out of de_thread
exec: Factor unshare_sighand out of de_thread and call it separately
exec: Only compute current once in flush_old_exec
pid: Improve the comment about waiting in zap_pid_ns_processes
proc: Remove the now unnecessary internal mount of proc
uml: Create a private mount of proc for mconsole
uml: Don't consult current to find the proc_mnt in mconsole_proc
proc: Use a list of inodes to flush from proc
...
Commit fa7b9a805c ("tools/selftest/vm: allow choosing mem size and page
size in map_hugetlb") added the possibility to change the size of memory
mapped for the test, but left the read and write test using the default
value. This is unnoticed when mapping a length greater than the default
one, but segfaults otherwise.
Fix read_bytes() and write_bytes() by giving them the real length.
Also fix the call to munmap().
Fixes: fa7b9a805c ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/9a404a13c871c4bd0ba9ede68f69a1225180dd7e.1580978385.git.christophe.leroy@c-s.fr
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The tests use both shared and private mapped hugetlb memory, and monitors
the hugetlb usage counter as well as the hugetlb reservation counter.
They test different configurations such as hugetlb memory usage via
hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without
MAP_POPULATE.
Also add test for hugetlb reservation reparenting, since this is a subtle
issue.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sandipan Das <sandipan@linux.ibm.com> [powerpc64]
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-8-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was noticed that mlock2 tests are failing after 9c4e6b1a70 ("mm,
mlock, vmscan: no more skipping pagevecs") because the patch has changed
the timing on when the page is added to the unevictable LRU list and thus
gains the unevictable page flag.
The test was just too dependent on the implementation details which were
true at the time when it was introduced. Page flags and the timing when
they are set is something no userspace should ever depend on. The test
should be testing only for the user observable contract of the tested
syscalls. Those are defined pretty well for the mlock and there are other
means for testing them. In fact this is already done and testing for page
flags can be safely dropped to achieve the aimed purpose. Present bits
can be checked by /proc/<pid>/smaps RSS field and the locking state by
VmFlags although I would argue that Locked: field would be more
appropriate.
Drop all the page flag machinery and considerably simplify the test. This
should be more robust for future kernel changes while checking the
promised contract is still valid.
Fixes: 9c4e6b1a70 ("mm, mlock, vmscan: no more skipping pagevecs")
Reported-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's good to have basic unit test coverage of the new FOLL_PIN behavior.
Fortunately, the gup_benchmark unit test is extremely fast (a few
milliseconds), so adding it the the run_vmtests suite is going to cause no
noticeable change in running time.
So, add two new invocations to run_vmtests:
1) Run gup_benchmark with normal get_user_pages().
2) Run gup_benchmark with pin_user_pages(). This is much like the
first call, except that it sets FOLL_PIN.
Running these two in quick succession also provide a visual comparison of
the running times, which is convenient.
The new invocations are fairly early in the run_vmtests script, because
with test suites, it's usually preferable to put the shorter, faster tests
first, all other things being equal.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200211001536.1027652-11-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Up until now, gup_benchmark supported testing of the following kernel
functions:
* get_user_pages(): via the '-U' command line option
* get_user_pages_longterm(): via the '-L' command line option
* get_user_pages_fast(): as the default (no options required)
Add test coverage for the new corresponding pin_*() functions:
* pin_user_pages_fast(): via the '-a' command line option
* pin_user_pages(): via the '-b' command line option
Also, add an option for clarity: '-u' for what is now (still) the default
choice: get_user_pages_fast().
Also, for the commands that set FOLL_PIN, verify that the pages really are
dma-pinned, via the new is_dma_pinned() routine. Those commands are:
PIN_FAST_BENCHMARK : calls pin_user_pages_fast()
PIN_BENCHMARK : calls pin_user_pages()
In between the calls to pin_*() and unpin_user_pages(), check each page:
if page_maybe_dma_pinned() returns false, then WARN and return.
Do this outside of the benchmark timestamps, so that it doesn't affect
reported times.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200211001536.1027652-10-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ahci driver doesn't support error recovery, and if your root
filesystem is attached to it the eeh-basic.sh test will likely kill
your machine.
So skip any device we see using the ahci driver.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200326061144.2006522-1-mpe@ellerman.id.au
This kselftest update Linux 5.7-rc1 consists of:
- resctrl_tests for resctrl file system. resctrl isn't included in the
default TARGETS list in kselftest Makefile. It can be run manually.
- Kselftest harness improvements.
- Kselftest framework and individual test fixes to support runs on
Kernel CI rings and other environments that use relocatable build
and install features.
- Minor cleanups and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6Ev30ACgkQCwJExA0N
Qxwl+g//SZ541vt4wL9sK00Xzer5s2EpQBWLB5jfUGm3lD+rQLbeUDUw+7iwNcRP
hCZmB4NkVMBYa1ciBU+L9ax2ZdgGJnbOMqSTFSYI0MqYPH0ghNx6E1XQoaaVfj9X
18VdjORamU9PHHCv6u8s3PNEefUJUsdmbbJWq+CGf98DNxvbapFb+iH7DP/IcDjg
AjopCx9x+dckPAL6NYQbIp9LRPy8giyis92O67I+IMvS8eOFkw6afaHuzR9me2H3
8YJz/JIV9MBDqHyCJswp/SilyRtv917DWPqsvjhMMk/P5NJhhCjfFwnm5t8M2Nu0
xlaz8BA0g+Ofo/y8YIr3oOJO0yTAYq2xHJx83oRPWbg9LK9gAHS6eZeGrqD0nQI8
vhfmP+ej5tXT7yu03YgkY+7192fy0Y094od86qd0RuW47On04A9yF9vSSoRsEmSq
XA7n5ltVB9pqT/+c+Tq3Xeh7/NK/C2MnioAmaEBALXSIwFOUzlguIxpO3Uj1yBhl
rcF0QzO8efGM9jCatJUUxxM6haIsE13ttKoBCFRGavvfSztNlOGr8j558EMUDlPR
t6aaPOOiFj3OdPHb34wIDwaYjiAbCkiOdd/sVJw3fTLEB9dj+cJv/x9BFjijZnLn
zk360qL+jFnHHumeUiEP0UOMXLI9NytYpbeAO7tmfZfQVoInmNQ=
=FsXI
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"This kselftest update consists of:
- resctrl_tests for resctrl file system. resctrl isn't included in
the default TARGETS list in kselftest Makefile. It can be run
manually.
- Kselftest harness improvements.
- Kselftest framework and individual test fixes to support runs on
Kernel CI rings and other environments that use relocatable build
and install features.
- Minor cleanups and typo fixes"
* tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests: enforce local header dependency in lib.mk
selftests: Fix memfd to support relocatable build (O=objdir)
selftests: Fix seccomp to support relocatable build (O=objdir)
selftests/harness: Handle timeouts cleanly
selftests/harness: Move test child waiting logic
selftests: android: Fix custom install from skipping test progs
selftests: android: ion: Fix ionmap_test compile error
selftests: Fix kselftest O=objdir build from cluttering top level objdir
selftests/seccomp: Adjust test fixture counts
selftests/ftrace: Fix typo in trigger-multihist.tc
selftests/timens: Remove duplicated include <time.h>
selftests/resctrl: fix spelling mistake "Errror" -> "Error"
selftests/resctrl: Add the test in MAINTAINERS
selftests/resctrl: Disable MBA and MBM tests for AMD
selftests/resctrl: Use cache index3 id for AMD schemata masks
selftests/resctrl: Add vendor detection mechanism
selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
selftests/resctrl: Add MBA test
selftests/resctrl: Add MBM test
...
We added a usage of try-run to pmu/ebb/Makefile to detect if the
toolchain supported the -no-pie option.
This fails if we build out-of-tree and the source tree is not
writable, as try-run tries to write its temporary files to the current
directory. That leads to the -no-pie option being silently dropped,
which leads to broken executables with some toolchains.
If we remove the redirect to /dev/null in try-run, we see the error:
make[3]: Entering directory '/linux/tools/testing/selftests/powerpc/pmu/ebb'
/usr/bin/ld: cannot open output file .54.tmp: Read-only file system
collect2: error: ld returned 1 exit status
make[3]: Nothing to be done for 'all'.
And looking with strace we see it's trying to use a file that's in the
source tree:
lstat("/linux/tools/testing/selftests/powerpc/pmu/ebb/.54.tmp", 0x7ffffc0f83c8)
We can fix it by setting TMPOUT to point to the $(OUTPUT) directory,
and we can verify with strace it's now trying to write to the output
directory:
lstat("/output/kselftest/powerpc/pmu/ebb/.54.tmp", 0x7fffd1bf6bf8)
And also see that the -no-pie option is now correctly detected.
Fixes: 0695f8bca9 ("selftests/powerpc: Handle Makefile for unrecognized option")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200327095319.2347641-1-mpe@ellerman.id.au
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
Pull misc x86 updates from Ingo Molnar:
- extend the decoder maps with CET instructions
- fix !vDSO corner cases
* 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/tests: Add CET instructions to the new instructions test
x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map
selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
selftests/x86/vdso: Fix no-vDSO segfaults
Add test cases that verify that each registered packet trap policer:
* Honors that imposed limitations of rate and burst size
* Able to police trapped packets to the specified rate
* Able to police trapped packets to the specified burst size
* Able to be unbound from its trap group
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test cases for packet trap policer set / show commands as well as
for the binding of these policers to packet trap groups.
Both good and bad flows are tested for maximum coverage.
v2:
* Add test case with new 'fail_trap_policer_set' knob
* Add test case for partially modified trap group
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add selftests to exercise FD-based cgroup BPF program attachments and their
intermixing with legacy cgroup BPF attachments. Auto-detachment and program
replacement (both unconditional and cmpxchng-like) are tested as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-5-andriin@fb.com
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Make kfree_rcu() use kfree_bulk() for added performance
- RCU updates
- Callback-overload handling updates
- Tasks-RCU KCSAN and sparse updates
- Locking torture test and RCU torture test updates
- Documentation updates
- Miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
rcu: Make rcu_barrier() account for offline no-CBs CPUs
rcu: Mark rcu_state.gp_seq to detect concurrent writes
Documentation/memory-barriers: Fix typos
doc: Add rcutorture scripting to torture.txt
doc/RCU/rcu: Use https instead of http if possible
doc/RCU/rcu: Use absolute paths for non-rst files
doc/RCU/rcu: Use ':ref:' for links to other docs
doc/RCU/listRCU: Update example function name
doc/RCU/listRCU: Fix typos in a example code snippets
doc/RCU/Design: Remove remaining HTML tags in ReST files
doc: Add some more RCU list patterns in the kernel
rcutorture: Set KCSAN Kconfig options to detect more data races
rcutorture: Manually clean up after rcu_barrier() failure
rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU
rcuperf: Measure memory footprint during kfree_rcu() test
rcutorture: Annotation lockless accesses to rcu_torture_current
rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch
rcutorture: Fix stray access to rcu_fwd_cb_nodelay
rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race
rcutorture: Make kvm-find-errors.sh abort on bad directory
...
Its possible to have divergent ALU32 and ALU64 bounds when using JMP32
instructins and ALU64 arithmatic operations. Sometimes the clang will
even generate this code. Because the case is a bit tricky lets add
a specific test for it.
Here is pseudocode asm version to illustrate the idea,
1 r0 = 0xffffffff00000001;
2 if w0 > 1 goto %l[fail];
3 r0 += 1
5 if w0 > 2 goto %l[fail]
6 exit
The intent here is the verifier will fail the load if the 32bit bounds
are not tracked correctly through ALU64 op. Similarly we can check the
64bit bounds are correctly zero extended after ALU32 ops.
1 r0 = 0xffffffff00000001;
2 w0 += 1
2 if r0 > 3 goto %l[fail];
6 exit
The above will fail if we do not correctly zero extend 64bit bounds
after 32bit op.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158560430155.10843.514209255758200922.stgit@john-Precision-5820-Tower
After changes to add update_reg_bounds after ALU ops and 32-bit bounds
tracking truncation of boundary crossing range will fail earlier and with
a different error message. Now the test error trace is the following
11: (17) r1 -= 2147483584
12: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,smin_value=-2147483584,smax_value=63)
R10=fp0 fp-8_w=mmmmmmmm
12: (17) r1 -= 2147483584
13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,
umin_value=18446744069414584448,umax_value=18446744071562068095,
var_off=(0xffffffff00000000; 0xffffffff))
R10=fp0 fp-8_w=mmmmmmmm
13: (77) r1 >>= 8
14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,
umin_value=72057594021150720,umax_value=72057594029539328,
var_off=(0xffffffff000000; 0xffffff),
s32_min_value=-16777216,s32_max_value=-1,
u32_min_value=-16777216)
R10=fp0 fp-8_w=mmmmmmmm
14: (0f) r0 += r1
value 72057594021150720 makes map_value pointer be out of bounds
Because we have 'umin_value == umax_value' instead of previously
where 'umin_value != umax_value' we can now fail earlier noting
that pointer addition is out of bounds.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158560428103.10843.6316594510312781186.stgit@john-Precision-5820-Tower
With current ALU32 subreg handling and retval refine fix from last
patches we see an expected failure in test_verifier. With verbose
verifier state being printed at each step for clarity we have the
following relavent lines [I omit register states that are not
necessarily useful to see failure cause],
#101/p bpf_get_stack return R0 within range FAIL
Failed to load prog 'Success'!
[..]
14: (85) call bpf_get_stack#67
R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
R3_w=inv48
15:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
15: (b7) r1 = 0
16:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
16: (bf) r8 = r0
17:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
17: (67) r8 <<= 32
18:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smax_value=9223372032559808512,
umax_value=18446744069414584320,
var_off=(0x0; 0xffffffff00000000),
s32_min_value=0,
s32_max_value=0,
u32_max_value=0,
var32_off=(0x0; 0x0))
18: (c7) r8 s>>= 32
19
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=2147483647,
var32_off=(0x0; 0xffffffff))
19: (cd) if r1 s< r8 goto pc+16
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=0,
var32_off=(0x0; 0xffffffff))
20:
R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
R1_w=inv0
R8_w=inv(id=0,smin_value=-2147483648,
smax_value=0,
R9=inv48
20: (1f) r9 -= r8
21: (bf) r2 = r7
22:
R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
22: (0f) r2 += r8
value -2147483648 makes map_value pointer be out of bounds
After call bpf_get_stack() on line 14 and some moves we have at line 16
an r8 bound with max_value 48 but an unknown min value. This is to be
expected bpf_get_stack call can only return a max of the input size but
is free to return any negative error in the 32-bit register space. The
C helper is returning an int so will use lower 32-bits.
Lines 17 and 18 clear the top 32 bits with a left/right shift but use
ARSH so we still have worst case min bound before line 19 of -2147483648.
At this point the signed check 'r1 s< r8' meant to protect the addition
on line 22 where dst reg is a map_value pointer may very well return
true with a large negative number. Then the final line 22 will detect
this as an invalid operation and fail the program. What we want to do
is proceed only if r8 is positive non-error. So change 'r1 s< r8' to
'r1 s> r8' so that we jump if r8 is negative.
Next we will throw an error because we access past the end of the map
value. The map value size is 48 and sizeof(struct test_val) is 48 so
we walk off the end of the map value on the second call to
get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to
24 by using 'sizeof(struct test_val) / 2'. After this everything passes
as expected.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower
Before this series the verifier would clamp return bounds of
bpf_get_stack() to [0, X] and this led the verifier to believe
that a JMP_JSLT 0 would be false and so would prune that path.
The result is anything hidden behind that JSLT would be unverified.
Add a test to catch this case by hiding an goto pc-1 behind the
check which will cause an infinite loop if not rejected.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158560423908.10843.11783152347709008373.stgit@john-Precision-5820-Tower
Here is the "big" set of driver core changes for 5.7-rc1.
Nothing huge in here, just lots of little firmware core changes and use
of new apis, a libfs fix, a debugfs api change, and some driver core
deferred probe rework.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXoHLIg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yle2ACgjJJzRJl9Ckae3ms+9CS4OSFFZPsAoKSrXmFc
Z7goYQdZo1zz8c0RYDrJ
=Y91m
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core changes for 5.7-rc1.
Nothing huge in here, just lots of little firmware core changes and
use of new apis, a libfs fix, a debugfs api change, and some driver
core deferred probe rework.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (44 commits)
Revert "driver core: Set fw_devlink to "permissive" behavior by default"
driver core: Set fw_devlink to "permissive" behavior by default
driver core: Replace open-coded list_last_entry()
driver core: Read atomic counter once in driver_probe_done()
libfs: fix infoleak in simple_attr_read()
driver core: Add device links from fwnode only for the primary device
platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet
platform/x86: touchscreen_dmi: Add EFI embedded firmware info support
Input: icn8505 - Switch to firmware_request_platform for retreiving the fw
Input: silead - Switch to firmware_request_platform for retreiving the fw
selftests: firmware: Add firmware_request_platform tests
test_firmware: add support for firmware_request_platform
firmware: Add new platform fallback mechanism and firmware_request_platform()
Revert "drivers: base: power: wakeup.c: Use built-in RCU list checking"
drivers: base: power: wakeup.c: Use built-in RCU list checking
component: allow missing unbind callback
debugfs: remove return value of debugfs_create_file_size()
debugfs: Check module state before warning in {full/open}_proxy_open()
firmware: fix a double abort case with fw_load_sysfs_fallback
arch_topology: Fix putting invalid cpu clk
...
Add support for testing UDP sk_assign to the existing tests.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-6-joe@wand.net.nz
Attach a tc direct-action classifier to lo in a fresh network
namespace, and rewrite all connection attempts to localhost:4321
to localhost:1234 (for port tests) and connections to unreachable
IPv4/IPv6 IPs to the local socket (for address tests). Includes
implementations for both TCP and UDP.
Keep in mind that both client to server and server to client traffic
passes the classifier.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-5-joe@wand.net.nz
Co-authored-by: Joe Stringer <joe@wand.net.nz>
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
- Add missing compat_ioctl for notify (Sven Schnelle)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl6BcbwWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuhfD/9NjC6h+l9YNHW2O3bYPxDDjkJq
1aRInf+/UayTnfwhlZiJj2FFYyPR1gfZXB9CPcniYO/t6tsCdc+0kQdc3uUCPb0y
ClPp5Pc/u/SwEFgrj5gv/NsEwAVaTPy1ioagefZMENQXj77XcfifF5Mrave+lR3K
TiZsFItucIRTiEb8YY4xF/t5rn/lBvAqDiYNZwYYVcopnW3kgvOljz6ZRyOstV/B
J9QrErFfDH9SzPfK/1bZ5GbCUsTRzbGXA281UBhZdkJQaA3yoqK+yv/xKtoaX0WK
uxLPt2BG3qb21+8JZacJ2L6KQAwm5EdT+OyLyFzUYki23LsJNHEb+UpkoRnyJg5H
sSSZRj14WH5aK1REGTDLr5tgx5lxkXx/iuxYc4tuM56ToWS4hXNiQFU2cUcdqjSO
bSKVg1LO9FfTTMecYXUqljoOwAKMVra2nDNCpvkBr/1JMVFZjCfWpjy3ZvHHwpqt
BpxgfJW250HfnMWpa7k5p6bIP+WMwetwP1yGZx6xNz8j3ZSshIPUqCvTU6zD89CN
RXHMfnZOxNtq1biI41Ppc5/kCt2t4598BaGsWIWcjhhY8p5Ttq+HGs3tOPsuUXen
ccGAa/1Co0u5CCxudG4nZ2a/ooeijMx7D5HfvoYvHDQbugR/x4aSZuiw7JiTKBHr
EZCFZyxvFVqnqlzQ2Q==
=Ejyd
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"A couple of seccomp updates. They're both mostly bug fixes that I
wanted to have sit in linux-next for a while:
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
- add missing compat_ioctl for notify (Sven Schnelle)"
* tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Add missing compat_ioctl for notify
seccomp: allow TSYNC and USER_NOTIF together
Health reporters should be registered with auto recover set to true.
Align dummy reporter behaviour with that, as in later patch the option to
set auto recover behaviour will be removed.
In addition, align netdevsim selftest to the new default value.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'pm_nl_ctl' was adding a trailing whitespace after having printed the
IP. But at the end, the IP element is currently always the last one.
The bash script launching 'pm_nl_ctl' had trailing whitespaces in the
expected result on purpose. But these whitespaces have been removed when
the patch has been applied upstream. To avoid trailing whitespaces in
the bash code, 'pm_nl_ctl' and expected results have now been adapted.
The MPTCP PM selftest can now pass again.
Fixes: eedbc68532 (selftests: add PM netlink functional tests)
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the pm netlink to configure the creation of several
subflows, and verify that via MIB counters.
Update the mptcp_connect program to allow reliable MP_JOIN
handshake even on small data file
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces basic self-tests for the PM netlink,
checking the basic APIs and possible exceptional
values.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For historical reasons, there are several timestamping selftest targets
in selftests/networking/timestamping. Move them to the standard
directory for networking tests: selftests/net.
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Load/attach a BPF program that hooks to file_mprotect (int)
and bprm_committed_creds (void).
* Perform an action that triggers the hook.
* Verify if the audit event was received using the shared global
variables for the process executed.
* Verify if the mprotect returns a -EPERM.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-8-kpsingh@chromium.org
This adds a test to exercise the new bpf_map__set_initial_value() function.
The test simply overrides the global data section with all zeroes, and
checks that the new value makes it into the kernel map on load.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200329132253.232541-2-toke@redhat.com
A new file was added to the tracing directory that will allow a user to
place a PID into it and the task associated to that PID will not have its
events traced. If the event-fork option is enabled, then neither will the
children of that task have its events traced.
Cc: linux-kselftest@vger.kernel.org
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A new file was added to the tracing directory that will allow a user to
place a PID into it and the task associated to that PID will not be traced
by the function tracer. If the function-fork option is enabled, then neither
will the children of that task be traced by the function tracer.
Cc: linux-kselftest@vger.kernel.org
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the test checks the observable effect of skbedit priority:
queueing of packets at the correct qdisc band. It therefore misses the fact
that the counters for offloaded rules are not updated. Add an extra check
for the counter.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add local header dependency in lib.mk. This enforces the dependency
blindly even when a test doesn't include the file, with the benefit
of a simpler common logic without requiring individual tests to have
special rule for it.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix memfd to support relocatable build (O=objdir). This calls out
source files necessary to build tests and simplfies the dependency
enforcement.
Tested the following:
Note that cross-build for fuse_mnt has dependency on -lfuse.
make all
make clean
make kselftest-install O=/arm64_build/ ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- TARGETS=memfd
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix seccomp relocatable builds. This is a simple fix to use the
right lib.mk variable TEST_GEN_PROGS. Local header dependency
is addressed in a change to lib.mk as a framework change that
enforces the dependency without requiring changes to individual
tests.
The following use-cases work with this change:
In seccomp directory:
make all and make clean
From top level from main Makefile:
make kselftest-install O=objdir ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- TARGETS=seccomp
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When a selftest would timeout before, the program would just fall over
and no accounting of failures would be reported (i.e. it would result in
an incomplete TAP report). Instead, add an explicit SIGALRM handler to
cleanly catch and report the timeout.
Before:
[==========] Running 2 tests from 2 test cases.
[ RUN ] timeout.finish
[ OK ] timeout.finish
[ RUN ] timeout.too_long
Alarm clock
After:
[==========] Running 2 tests from 2 test cases.
[ RUN ] timeout.finish
[ OK ] timeout.finish
[ RUN ] timeout.too_long
timeout.too_long: Test terminated by timeout
[ FAIL ] timeout.too_long
[==========] 1 / 2 tests passed.
[ FAILED ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In order to better handle timeout failures, rearrange the child waiting
logic into a separate function. This is mostly a copy/paste with an
indentation change. To handle pid tracking, a new field is added for
the child pid. Also move the alarm() pairing into the function.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When DSCP is updated through an offloaded pedit action, DSCP rewrite on
egress should be disabled. Add a test that check that it is so.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test that runs packets with dsfield set, and test that pedit adjusts
the DSCP or ECN parts or the whole field.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a copy-paste typo in a comment and error message.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320205546.2396-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After changes to add update_reg_bounds after ALU ops and adding ALU32
bounds tracking the error message is changed in the 32-bit right shift
tests.
Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL"
now fails with,
Unexpected error message!
EXP: R0 invalid mem access
RES: func#0 @0
7: (b7) r1 = 2
8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm
8: (67) r1 <<= 31
9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm
9: (74) w1 >>= 31
10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
10: (14) w1 -= 2
11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm
11: (0f) r0 += r1
math between map_value pointer and 4294967294 is not allowed
And test "#70/p bounds check after 32-bit right shift with 64-bit input
FAIL" now fails with,
Unexpected error message!
EXP: R0 invalid mem access
RES: func#0 @0
7: (b7) r1 = 2
8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm
8: (67) r1 <<= 31
9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm
9: (74) w1 >>= 31
10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm
10: (14) w1 -= 2
11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm
11: (0f) r0 += r1
last_idx 11 first_idx 0
regs=2 stack=0 before 10: (14) w1 -= 2
regs=2 stack=0 before 9: (74) w1 >>= 31
regs=2 stack=0 before 8: (67) r1 <<= 31
regs=2 stack=0 before 7: (b7) r1 = 2
math between map_value pointer and 4294967294 is not allowed
Before this series we did not trip the "math between map_value pointer..."
error because check_reg_sane_offset is never called in
adjust_ptr_min_max_vals(). Instead we have a register state that looks
like this at line 11*,
11: R0_w=map_value(id=0,off=0,ks=8,vs=8,
smin_value=0,smax_value=0,
umin_value=0,umax_value=0,
var_off=(0x0; 0x0))
R1_w=invP(id=0,
smin_value=0,smax_value=4294967295,
umin_value=0,umax_value=4294967295,
var_off=(0xfffffffe; 0x0))
R10=fp(id=0,off=0,
smin_value=0,smax_value=0,
umin_value=0,umax_value=0,
var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm
11: (0f) r0 += r1
In R1 'smin_val != smax_val' yet we have a tnum_const as seen
by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check
in adjust_ptr_min_max_vals()
if ((known && (smin_val != smax_val || umin_val != umax_val)) ||
smin_val > smax_val || umin_val > umax_val) {
/* Taint dst register if offset had invalid bounds derived from
* e.g. dead branches.
*/
__mark_reg_unknown(env, dst_reg);
return 0;
}
So we don't throw an error here and instead only throw an error
later in the verification when the memory access is made.
The root cause in verifier without alu32 bounds tracking is having
'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set
when 'umin_value < umax_val' here,
if (dst_reg->umin_value < umax_val) {
/* Overflow possible, we know nothing */
dst_reg->umin_value = 0;
dst_reg->umax_value = U64_MAX;
} else { ...}
Later in adjust_calar_min_max_vals we previously did a
coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by
truncating to 32bits. But either way without a call to update_reg_bounds
the less precise bounds tracking will fall out of the alu op
verification.
After latest changes we now exit adjust_scalar_min_max_vals with the
more precise umin value, due to zero extension propogating bounds from
alu32 bounds into alu64 bounds and then calling update_reg_bounds.
This then causes the verifier to trigger an earlier error and we get
the error in the output above.
This patch updates tests to reflect new error message.
* I have a local patch to print entire verifier state regardless if we
believe it is a constant so we can get a full picture of the state.
Usually if tnum_is_const() then bounds are also smin=smax, etc. but
this is not always true and is a bit subtle. Being able to see these
states helps understand dataflow imo. Let me know if we want something
similar upstream.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
The lib files should not be defined as TEST_PROGS, or we will run them
in run_kselftest.sh.
Also remove ethtool_lib.sh exec permission.
Fixes: 81573b18f2 ("selftests/net/forwarding: add Makefile to install tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Find some tests are missed in Makefile by running:
for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds test cases for ptrace deadlocks.
Additionally fixes a compile problem in get_syscall_info.c,
observed with gcc-4.8.4:
get_syscall_info.c: In function 'get_syscall_info':
get_syscall_info.c:93:3: error: 'for' loop initial declarations are only
allowed in C99 mode
for (unsigned int i = 0; i < ARRAY_SIZE(args); ++i) {
^
get_syscall_info.c:93:3: note: use option -std=c99 or -std=gnu99 to compile
your code
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We recently regressed (cf. [1] and its corresponding fix in [2]) returning
ENOMEM when trying to create a process in a pid namespace whose init
process/child subreaper has already died. This has caused confusion at
least once before that (cf. [3]). Let's add a simple regression test to
catch this in the future.
[1]: 49cb2fc42c ("fork: extend clone3() to support setting a PID")
[2]: b26ebfe12f ("pid: Fix error return value in some cases")
[3]: 35f71bc0a0 ("fork: report pid reservation failure properly")
Cc: Corey Minyard <cminyard@mvista.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Some specific tests in powerpc can take longer than the default 45
seconds that added in commit 852c8cbf34 ("selftests/kselftest/runner.sh:
Add 45 second timeout per test") to run, the following test result was
collected across 2 Power8 nodes and 1 Power9 node in our pool:
powerpc/benchmarks/futex_bench - 52s
powerpc/dscr/dscr_sysfs_test - 116s
powerpc/signal/signal_fuzzer - 88s
powerpc/tm/tm_unavailable_test - 168s
powerpc/tm/tm-poison - 240s
Thus they will fail with TIMEOUT error. Disable the timeout setting
for these sub-tests to allow them finish properly.
https://bugs.launchpad.net/bugs/1864642
Fixes: 852c8cbf34 ("selftests/kselftest/runner.sh: Add 45 second timeout per test")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200318060004.10685-1-po-hsu.lin@canonical.com
The test case tm-signal-context-force-tm expects a segfault to happen
on returning from signal handler, and then does a setcontext() to run
the test again. However, the test doesn't always segfault, causing the
test to run a single time.
This patch fixes the test by putting it within a loop and jumping, via
setcontext, just prior to the loop in case it segfaults. This way we
get the desired behavior (run the test COUNT_MAX times) regardless if
it segfaults or not. This also reduces the use of setcontext for
control flow logic, keeping it only in the segfault handler.
Also, since 'count' is changed within the signal handler, it is
declared as volatile to prevent any compiler optimization getting
confused with asynchronous changes.
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-3-gustavold@linux.ibm.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) A new selftest for nf_queue, from Florian Westphal. This test
covers two recent fixes: 07f8e4d0fd ("tcp: also NULL skb->dev
when copy was needed") and b738a185be ("tcp: ensure skb->dev is
NULL before leaving TCP stack").
2) The fwd action breaks with ifb. For safety in next extensions,
make sure the fwd action only runs from ingress until it is extended
to be used from a different hook.
3) The pipapo set type now reports EEXIST in case of subrange overlaps.
Update the rbtree set to validate range overlaps, so far this
validation is only done only from userspace. From Stefano Brivio.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test case to check nf queue infrastructure.
Could be extended in the future to also cover serialization of
conntrack, uid and secctx attributes in nfqueue.
For now, this checks that 'queue bypass' works, that a queue rule with
no bypass option blocks traffic and that userspace receives the expected
number of packets.
For this we add two queues and hook all of
prerouting/input/forward/output/postrouting.
Packets get queued twice with a dummy base chain in between:
This passes with current nf tree, but reverting
commit 946c0d8e6e ("netfilter: nf_queue: fix reinject verdict handling")
makes this trip (it processes 30 instead of expected 20 packets).
v2: update config file with queue and other options missing/needed for
other tests.
v3: also test with tcp, this reveals problem with commit
28f8bfd1ac ("netfilter: Support iif matches in POSTROUTING"), due to
skb->dev pointing at another skb in the retransmit rbtree (skb->dev
aliases to rbnode child).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull RCU changes from Paul E. McKenney:
- Make kfree_rcu() use kfree_bulk() for added performance
- RCU updates
- Callback-overload handling updates
- Tasks-RCU KCSAN and sparse updates
- Locking torture test and RCU torture test updates
- Documentation updates
- Miscellaneous fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add missing Makefile for net/forwarding tests and include it to
the targets list, otherwise forwarding tests are not installed
in case of cross-compilation.
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reduce the reliance of trace samples (trace*_user) on bpf_load,
move read_trace_pipe to trace_helpers. By moving this bpf_loader helper
elsewhere, trace functions can be easily migrated to libbbpf.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200321100424.1593964-2-danieltimlee@gmail.com
This patch adds test to exercise the bpf_sk_storage_get()
and bpf_sk_storage_delete() helper from the bpf_dctcp.c.
The setup and check on the sk_storage is done immediately
before and after the connect().
This patch also takes this chance to move the pthread_create()
after the connect() has been done. That will remove the need of
the "wait_thread" label.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320152107.2169904-1-kafai@fb.com
Many systems build/test up-to-date kernels with older libcs, and
an older glibc (2.17) lacks the definition of SOL_DCCP in
/usr/include/bits/socket.h (it was added in the 4.6 timeframe).
Adding the definition to the test program avoids a compilation
failure that gets in the way of building tools/testing/selftests/net.
The test itself will work once the definition is added; either
skipping due to DCCP not being configured in the kernel under test
or passing, so there are no other more up-to-date glibc dependencies
here it seems beyond that missing definition.
Fixes: 11fb60d108 ("selftests: net: reuseport_addr_any: add DCCP")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Statistics on timestamps is useful to quantify average and tail latency.
Print timestamp statistics in count/avg/min/max format.
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following new flags:
-e: use level-triggered epoll() instead of poll().
-E: use event-triggered epoll() instead of poll().
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A longer sleep duration between sendmsg()s makes more cachelines to be
evicted and results in higher latency. Making the duration configurable.
Add the following new flags:
-S: Configurable sleep duration.
-b: Busy loop instead of poll().
Remove the following flag:
-D: No delay between packets: subsumed by -S.
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Txtimestamp reports latencies in uses resolution, while nsec is needed
in cases such as measuring latencies on localhost.
Add the following new flag:
-N: print timestamps and durations in nsec (instead of usec)
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrapper script txtimestamp.sh executes a pre-defined list of testcases
sequentially without configuration options available.
Add an option (-r/--run) to setup the test namespace and pass remaining
arguments to txtimestamp binary. The script still runs all tests when no
argument is passed.
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang's -Wmisleading-indentation warns about misleading indentations if
there's a mixture of spaces and tabs. Remove extraneous spaces.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200320201510.217169-1-morbo@google.com
Add tests cases for checking the new firmware_request_platform api.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20200115163554.101315-7-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
test_vdso would try to call a NULL pointer if the vDSO was missing.
vdso_restorer_32 hit a genuine failure: trying to use the
kernel-provided signal restorer doesn't work if the vDSO is missing.
Skip the test if the vDSO is missing, since the test adds no particular
value in that case.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/618ea7b8c55b10d08b1cb139e9a3a957934b8647.1584653439.git.luto@kernel.org
Add a test that runs traffic through a port such that skbedit priority
action acts on it during forwarding. Test that at egress, it is classified
correctly according to the new priority at a PRIO qdisc.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test triggers a TM Bad Thing by raising a signal in transactional state
and forcing a pagefault to happen in kernelspace when the kernel signal
handling code first touches the user signal stack.
This is inspired by the test tm-signal-context-force-tm but uses userfaultfd to
make the test deterministic. While this test always triggers the bug in one
run, I had to execute tm-signal-context-force-tm several times (the test runs
5000 times each execution) to trigger the same bug.
tm-signal-context-force-tm is kept instead of replaced because, while this test
is more reliable and triggers the same bug, tm-signal-context-force-tm has a
better coverage, in the sense that by running the test several times it might
trigger the pagefault and/or be preempted at different places.
v3: skip test if userfaultfd is unavailable.
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-2-gustavold@linux.ibm.com
There's two different paths through the sigreturn code, depending on
whether the VDSO is mapped or not. We recently discovered a bug in the
unmapped case, because it's not commonly used these days.
So add a test that sends itself a signal, then moves the VDSO, takes
another signal and finally unmaps the VDSO before sending itself
another signal. That tests the standard signal path, the code that
handles the VDSO being moved, and also the signal path in the case
where the VDSO is unmapped.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200304110402.6038-1-mpe@ellerman.id.au
The ftrace selftest "ftrace - test for function traceon/off triggers"
enables all events and reads the trace file. Now that the trace file does
not disable tracing, and will attempt to continually read new data that is
added, the selftest gets stuck reading the trace file. This is because the
data added to the trace file will fill up quicker than the reading of it.
By only enabling scheduling events, the read can keep up with the writes.
Instead of enabling all events, only enable the scheduler events.
Link: http://lkml.kernel.org/r/20200318111345.0516642e@gandalf.local.home
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This adds a stress test that should hopefully help us catch regressions
for [1], [2], and [3].
[1]: 2669b8b0c7 ("binder: prevent UAF for binderfs devices")
[2]: f0fe2c0f05 ("binder: prevent UAF for binderfs devices II")
[3]: 211b64e4b5 ("binderfs: use refcount for binder control devices too")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200313152420.138777-3-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unprivileged users will be able to create directories in there. The
unprivileged test for /dev wouldn't have worked on most systems.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200313152420.138777-2-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Makes for nicer output and prepares for additional tests.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200313152420.138777-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We precompute the static-static ECDH during configuration time, in order
to save an expensive computation later when receiving network packets.
However, not all ECDH computations yield a contributory result. Prior,
we were just not letting those peers be added to the interface. However,
this creates a strange inconsistency, since it was still possible to add
other weird points, like a valid public key plus a low-order point, and,
like points that result in zeros, a handshake would not complete. In
order to make the behavior more uniform and less surprising, simply
allow all peers to be added. Then, we'll error out later when doing the
crypto if there's an issue. This also adds more separation between the
crypto layer and the configuration layer.
Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case this helps expose bugs with the newer 64-bit time_t types, we do
our testing with the newer musl that supports this as well as
CONFIG_COMPAT_32BIT_TIME=n. This matters to us, since wireguard does in
fact deal with timestamps.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit removes a duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests for mlxsw hw_stats types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>