The bpf map sockmap supports adding programs via attach commands. This
patch adds the detach command to keep the API symmetric and allow
users to remove previously added programs. Otherwise the user would
have to delete the map and re-add it to get in this state.
This also adds a series of additional tests to capture detach operation
and also attaching/detaching invalid prog types.
API note: socks will run (or not run) programs depending on the state
of the map at the time the sock is added. We do not for example walk
the map and remove programs from previously attached socks.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"
thanks to Daniel Borkmann for spotting one of these mistakes
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sockmap is a bit different than normal stress tests that can run
in parallel as is. We need to reuse the same socket pool and map
pool to get good stress test cases.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When attaching a program to sockmap we need to check map type
is correct.
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some more sockmap tests to cover,
- forwarding to NULL entries
- more than two maps to test list ops
- forwarding to different map
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the initial sockmap API we provided strparser and verdict programs
using a single attach command by extending the attach API with a the
attach_bpf_fd2 field.
However, if we add other programs in the future we will be adding a
field for every new possible type, attach_bpf_fd(3,4,..). This
seems a bit clumsy for an API. So lets push the programs using two
new type fields.
BPF_SK_SKB_STREAM_PARSER
BPF_SK_SKB_STREAM_VERDICT
This has the advantage of having a readable name and can easily be
extended in the future.
Updates to samples and sockmap included here also generalize tests
slightly to support upcoming patch for multiple map support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This generates a set of sockets, attaches BPF programs, and sends some
simple traffic using basic send/recv pattern. Additionally, we do a bunch
of negative tests to ensure adding/removing socks out of the sockmap fail
correctly.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.
Fixes: 546ac1ffb7 "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.
The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.
Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test case to track behaviour when traversing and updating the
htab map. We recently used such traversal, so it's quite useful to
keep it as an example in selftests.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To overcome bugs as described and fixed in 89087c456f ("bpf: Fix
values type used in test_maps"), provide a generic BPF_DECLARE_PERCPU()
and bpf_percpu() accessor macro for all percpu map values used in
tests.
Declaring variables works as follows (also works for structs):
BPF_DECLARE_PERCPU(uint32_t, my_value);
They can then be accessed normally as uint32_t type through:
bpf_percpu(my_value, <cpu_nr>)
For example:
bpf_percpu(my_value, 0)++;
Implicitly, we make sure that the passed type is allocated and aligned
by gcc at least on a 8-byte boundary, so that it works together with
the map lookup/update syscall for percpu maps. We use it as a usage
example in test_maps, so that others are free to adapt this into their
code when necessary.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.
This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
In both kmalloc and prealloc mode the bpf_map_update_elem() is using
per-cpu extra_elems to do atomic update when the map is full.
There are two issues with it. The logic can be misused, since it allows
max_entries+num_cpus elements to be present in the map. And alloc_extra_elems()
at map creation time can fail percpu alloc for large map values with a warn:
WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60
illegal size (32824) or align (8) for percpu allocation
The fixes for both of these issues are different for kmalloc and prealloc modes.
For prealloc mode allocate extra num_possible_cpus elements and store
their pointers into extra_elems array instead of actual elements.
Hence we can use these hidden(spare) elements not only when the map is full
but during bpf_map_update_elem() that replaces existing element too.
That also improves performance, since pcpu_freelist_pop/push is avoided.
Unfortunately this approach cannot be used for kmalloc mode which needs
to kfree elements after rcu grace period. Therefore switch it back to normal
kmalloc even when full and old element exists like it was prior to
commit 6c90598174 ("bpf: pre-allocate hash map elements").
Add tests to check for over max_entries and large map values.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) The test_lru_map and test_lru_dist fails building on my machine since
the sys/resource.h header is not included.
2) test_verifier fails in one test case where we try to call an invalid
function, since the verifier log output changed wrt printing function
names.
3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
retrieving the number of possible CPUs. This is broken at least in our
scenario and really just doesn't work.
glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
if that fails, depending on the config, it either tries to count CPUs
in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
If /proc/cpuinfo has some issue, it returns just 1 worst case. This
oddity is nothing new [1], but semantics/behaviour seems to be settled.
_SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
that fails it looks into /proc/stat for cpuX entries, and if also that
fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
unlikely all breaks down).
While that might match num_possible_cpus() from the kernel in some
cases, it's really not guaranteed with CPU hotplugging, and can result
in a buffer overflow since the array in user space could have too few
number of slots, and on perpcu map lookup, the kernel will write beyond
that memory of the value buffer.
William Tu reported such mismatches:
[...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
happens when CPU hotadd is enabled. For example, in Fusion when
setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
-smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
will be 2 [2]. [...]
Documentation/cputopology.txt says /sys/devices/system/cpu/possible
outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
own implementation. Later, we could add support to bpf(2) for passing
a mask via CPU_SET(3), for example, to just select a subset of CPUs.
BPF samples code needs this fix as well (at least so that people stop
copying this). Thus, define bpf_num_possible_cpus() once in selftests
and import it from there for the sample code to avoid duplicating it.
The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
After all three issues are fixed, the test suite runs fine again:
# make run_tests | grep self
selftests: test_verifier [PASS]
selftests: test_maps [PASS]
selftests: test_lru_map [PASS]
selftests: test_kmod.sh [PASS]
[1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
Fixes: 3059303f59 ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191 ("Add sample for adding simple drop program to link")
Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e155967179 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf98 ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a start of a test suite for kernel selftests. This moves test_verifier
and test_maps over to tools/testing/selftests/bpf/ along with various
code improvements and also adds a script for invoking test_bpf module.
The test suite can simply be run via selftest framework, f.e.:
# cd tools/testing/selftests/bpf/
# make
# make run_tests
Both test_verifier and test_maps were kind of misplaced in samples/bpf/
directory and we were looking into adding them to selftests for a while
now, so it can be picked up by kbuild bot et al and hopefully also get
more exposure and thus new test case additions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>