Sync bpf_get_socket_cookie() related bpf UAPI changes to tools/.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Update with latest changes from include/uapi/linux/bpf.h header.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To pick the rename in:
bd3a08aaa9 ("bpf: flowlabel in bpf_fib_lookup should be flowinfo")
Silencing this build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zd1sgtbybtjrrt7bqdybu0s0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-06-05
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
connect: "This allows to override source IP (including the case when it's
set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
TCP and connected UDP (fast path) are not affected. This makes UDP support
complete, that is, connected UDP is handled by connect hooks, unconnected
by sendmsg ones.", from Andrey.
2) Rework of the AF_XDP API to allow extending it in future for type writer
model if necessary. In this mode a memory window is passed to hardware
and multiple frames might be filled into that window instead of just one
that is the case in the current fixed frame-size model. With the new
changes made this can be supported without having to add a new descriptor
format. Also, core bits for the zero-copy support for AF_XDP have been
merged as agreed upon, where i40e bits will be routed via Jeff later on.
Various improvements to documentation and sample programs included as
well, all from Björn and Magnus.
3) Given BPF's flexibility, a new program type has been added to implement
infrared decoders. Quote: "The kernel IR decoders support the most
widely used IR protocols, but there are many protocols which are not
supported. [...] There is a 'long tail' of unsupported IR protocols,
for which lircd is need to decode the IR. IR encoding is done in such
a way that some simple circuit can decode it; therefore, BPF is ideal.
[...] user-space can define a decoder in BPF, attach it to the rc
device through the lirc chardev.", from Sean.
4) Several improvements and fixes to BPF core, among others, dumping map
and prog IDs into fdinfo which is a straight forward way to correlate
BPF objects used by applications, removing an indirect call and therefore
retpoline in all map lookup/update/delete calls by invoking the callback
directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
for tc BPF programs to have an efficient way of looking up cgroup v2 id
for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
state that hasn't been filled, to allow context access wrt pt_regs in
32 bit archs for tracing, and last but not least various test cases
for fixes that landed in bpf earlier, from Daniel.
5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.
6) Add a new bpf_get_current_cgroup_id() helper that can be used in
tracing to retrieve the cgroup id from the current process in order
to allow for e.g. aggregation of container-level events, from Yonghong.
7) Two follow-up fixes for BTF to reject invalid input values and
related to that also two test cases for BPF kselftests, from Martin.
8) Various API improvements to the bpf_fib_lookup() helper, that is,
dropping MPLS bits which are not fully hashed out yet, rejecting
invalid helper flags, returning error for unsupported address
families as well as renaming flowlabel to flowinfo, from David.
9) Various fixes and improvements to sockmap BPF kselftests in particular
in proper error detection and data verification, from Prashant.
10) Two arm32 BPF JIT improvements. One is to fix imm range check with
regards to whether immediate fits into 24 bits, and a naming cleanup
to get functions related to rsh handling consistent to those handling
lsh, from Wang.
11) Two compile warning fixes in BPF, one for BTF and a false positive
to silent gcc in stack_map_get_build_id_offset(), from Arnd.
12) Add missing seg6.h header into tools include infrastructure in order
to fix compilation of BPF kselftests, from Mathieu.
13) Several formatting cleanups in the BPF UAPI helper description that
also fix an error during rst2man compilation, from Quentin.
14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
not built into the kernel, from Yue.
15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull in recent changes from include/uapi/linux/bpf.h.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is simple test over rc-loopback.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
These are minor edits for the eBPF helpers documentation in
include/uapi/linux/bpf.h.
The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends
with a non-escaped underscore that gets interpreted by rst2man and
produces the following message in the resulting manual page:
DOCUTILS SYSTEM MESSAGES
System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514)
Unknown target name: "bpf_fib_lookup".
Other edits consist in:
- Improving formatting for flag values for "bpf_fib_lookup()" helper.
- Emphasising a parameter name in description of the return value for
"bpf_get_stack()" helper.
- Removing unnecessary blank lines between "Description" and "Return"
sections for the few helpers that would use it, for consistency.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync new `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG`
attach types to tools/.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and
implement bpf_task_fd_query() in libbpf. The test programs
in samples/bpf and tools/testing/selftests/bpf, and later bpftool
will use this libbpf function to query kernel.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new test for the seg6local End.BPF action. The following helpers
are also tested:
- bpf_lwt_push_encap within the LWT BPF IN hook
- bpf_lwt_seg6_action
- bpf_lwt_seg6_adjust_srh
- bpf_lwt_seg6_store_bytes
A chain of End.BPF actions is built. The SRH is injected through a LWT
BPF IN hook before entering this chain. Each End.BPF action validates
the previous one, otherwise the packet is dropped. The test succeeds
if the last node in the chain receives the packet and the UDP datagram
contained can be retrieved from userspace.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
JITed image lengths of each function in a multi-function
program.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a program.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch sync the uapi bpf.h and btf.h to tools.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add tests for BPF_PROG_TYPE_SK_MSG to test_verifier for read access
to new sk fields.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This runs existing SOCKMAP tests with SOCKHASH map type. To do this
we push programs into include file and build two BPF programs. One
for SOCKHASH and one for SOCKMAP.
We then run the entire test suite with each type.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch sync the tools/include/uapi/linux/btf.h with
the newly introduced BTF ID support.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Only sync the header from include/uapi/linux/bpf.h.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bring fixes for eBPF helper documentation formatting to bpf.h under
tools/ as well.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The tools header file bpf.h is synced with kernel uapi bpf.h.
The new helper is also added to bpf_helpers.h.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update tools/include/uapi/linux/bpf.h file in order to reflect the
changes for BPF helper functions documentation introduced in previous
commits.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds a new test program test_sockmap which is the old sample
sockmap program. By moving the sample program here we can now run it
as part of the self tests suite. To support this a populate_progs()
routine is added to load programs and maps which was previously done
with load_bpf_file(). This is needed because self test libs do not
provide a similar routine. Also we now use the cgroup_helpers
routines to manage cgroup use instead of manually creating one and
supplying it to the CLI.
Notice we keep the CLI around though because it is useful for dbg
and specialized testing.
To run use ./test_sockmap and the result should be,
Summary 660 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a test for fetching xfrm state parameters from a tc program running
on ingress.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch sync up the bpf.h and btf.h to tools/
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
adding selftests for bpf_xdp_adjust_tail helper. in this synthetic test
we are testing that 1) if data_end < data helper will return EINVAL
2) for normal use case packet's length would be reduced.
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync the following tooling headers with the latest kernel version:
tools/arch/arm/include/uapi/asm/kvm.h
- New ABI: KVM_REG_ARM_*
tools/arch/x86/include/asm/required-features.h
- Removal of NEED_LA57 dependency
tools/arch/x86/include/uapi/asm/kvm.h
- New KVM ABI: KVM_SYNC_X86_*
tools/include/uapi/asm-generic/mman-common.h
- New ABI: MAP_FIXED_NOREPLACE flag
tools/include/uapi/linux/bpf.h
- New ABI: BPF_F_SEQ_NUMBER functions
tools/include/uapi/linux/if_link.h
- New ABI: IFLA tun and rmnet support
tools/include/uapi/linux/kvm.h
- New ABI: hyperv eventfd and CONN_ID_MASK support plus header cleanups
tools/include/uapi/sound/asound.h
- New ABI: SNDRV_PCM_FORMAT_FIRST PCM format specifier
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
- The x86 system call table description changed due to the ptregs changes and the renames, in:
d5a00528b58c: syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
5ac9efa3c50d: syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
ebeb8c82ffaf: syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
Also fix the x86 syscall table warning:
-Warning: Kernel ABI header at 'tools/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
+Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
None of these changes impact existing tooling code, so we only have to copy the kernel version.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Robbins <brianrob@microsoft.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com> <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takuya Yamamoto <tkydevel@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20180416064024.ofjtrz5yuu3ykhvl@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add selftest for attach types `BPF_CGROUP_INET4_POST_BIND` and
`BPF_CGROUP_INET6_POST_BIND`.
The main things tested are:
* prog load behaves as expected (valid/invalid accesses in prog);
* prog attach behaves as expected (load- vs attach-time attach types);
* `BPF_CGROUP_INET_SOCK_CREATE` can be attached in a backward compatible
way;
* post-hooks return expected result and errno.
Example:
# ./test_sock
Test case: bind4 load with invalid access: src_ip6 .. [PASS]
Test case: bind4 load with invalid access: mark .. [PASS]
Test case: bind6 load with invalid access: src_ip4 .. [PASS]
Test case: sock_create load with invalid access: src_port .. [PASS]
Test case: sock_create load w/o expected_attach_type (compat mode) ..
[PASS]
Test case: sock_create load w/ expected_attach_type .. [PASS]
Test case: attach type mismatch bind4 vs bind6 .. [PASS]
Test case: attach type mismatch bind6 vs bind4 .. [PASS]
Test case: attach type mismatch default vs bind4 .. [PASS]
Test case: attach type mismatch bind6 vs sock_create .. [PASS]
Test case: bind4 reject all .. [PASS]
Test case: bind6 reject all .. [PASS]
Test case: bind6 deny specific IP & port .. [PASS]
Test case: bind4 allow specific IP & port .. [PASS]
Test case: bind4 allow all .. [PASS]
Test case: bind6 allow all .. [PASS]
Summary: 16 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add selftest to work with bpf_sock_addr context from
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` programs.
Try to bind(2) on IP:port and apply:
* loads to make sure context can be read correctly, including narrow
loads (byte, half) for IP and full-size loads (word) for all fields;
* stores to those fields allowed by verifier.
All combination from IPv4/IPv6 and TCP/UDP are tested.
Both scenarios are tested:
* valid programs can be loaded and attached;
* invalid programs can be neither loaded nor attached.
Test passes when expected data can be read from context in the
BPF-program, and after the call to bind(2) socket is bound to IP:port
pair that was written by BPF-program to the context.
Example:
# ./test_sock_addr
Attached bind4 program.
Test case #1 (IPv4/TCP):
Requested: bind(192.168.1.254, 4040) ..
Actual: bind(127.0.0.1, 4444)
Test case #2 (IPv4/UDP):
Requested: bind(192.168.1.254, 4040) ..
Actual: bind(127.0.0.1, 4444)
Attached bind6 program.
Test case #3 (IPv6/TCP):
Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
Actual: bind(::1, 6666)
Test case #4 (IPv6/UDP):
Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
Actual: bind(::1, 6666)
### SUCCESS
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support setting `expected_attach_type` at prog load time in both
`bpf/bpf.h` and `bpf/libbpf.h`.
Since both headers already have API to load programs, new functions are
added not to break backward compatibility for existing ones:
* `bpf_load_program_xattr()` is added to `bpf/bpf.h`;
* `bpf_prog_load_xattr()` is added to `bpf/libbpf.h`.
Both new functions accept structures, `struct bpf_load_program_attr` and
`struct bpf_prog_load_attr` correspondingly, where new fields can be
added in the future w/o changing the API.
Standard `_xattr` suffix is used to name the new API functions.
Since `bpf_load_program_name()` is not used as heavily as
`bpf_load_program()`, it was removed in favor of more generic
`bpf_load_program_xattr()`.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.
The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.
Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sockmap option to use SK_MSG program types.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test_stacktrace_build_id() is added. It accesses tracepoint urandom_read
with "dd" and "urandom_read" and gathers stack traces. Then it reads the
stack traces from the stackmap.
urandom_read is a statically link binary that reads from /dev/urandom.
test_stacktrace_build_id() calls readelf to read build ID of urandom_read
and compares it with build ID from the stackmap.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Added a selftest for tcpbpf (sock_ops) that checks that the appropriate
callbacks occured and that it can access tcp_sock fields and that their
values are correct.
Run with command: ./test_tcpbpf_user
Adding the flag "-d" will show why it did not pass.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tell user space about device on which the map was created.
Unfortunate reality of user ABI makes sharing this code
with program offload difficult but the information is the
same.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Update tools/include/uapi/linux/bpf.h to bring it in sync with
include/uapi/linux/bpf.h. The listed commits forgot to update it.
Fixes: 02dd3291b2 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
Fixes: f19397a5c6 ("bpf: Add access to snd_cwnd and others in sock_ops")
Fixes: 06ef0ccb5a ("bpf/cgroup: fix a verification error for a CGROUP_DEVICE type prog")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF map offload follow similar path to program offload. At creation
time users may specify ifindex of the device on which they want to
create the map. Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation. Map will have an empty set of operations
associated with it (save for alloc and free callbacks). The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures. Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.
Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held. Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns). Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.
All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Report to the user ifindex and namespace information of offloaded
programs. If device has disappeared return -ENODEV. Specify the
namespace using dev/inode combination.
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- recognize relocation emitted by llvm
- since all regular function will be kept in .text section and llvm
takes care of pc-relative offsets in bpf_call instruction
simply copy all of .text to relevant program section while adjusting
bpf_call instructions in program section to point to newly copied
body of instructions from .text
- do so for all programs in the elf file
- set all programs types to the one passed to bpf_prog_load()
Note for elf files with multiple programs that use different
functions in .text section we need to do 'linker' style logic.
This work is still TBD
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This reverts commit 928631e054 ("bpftool: print program device bound
info"). We will remove this API and redo it right in -next.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_target_prog seems long and clunky, rename it to prog_ifindex.
We don't want to call this field just ifindex, because maps
may need a similar field in the future and bpf_attr members for
programs and maps are unnamed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple cases of overlapping changes in the packet scheduler.
Must easier to resolve this time.
Which probably means that I screwed it up somehow.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.
This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If program is bound to a device, print the name of the relevant
interface or unknown if the netdev has since been removed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the SPDX license tags were added a number of tooling headers got out of
sync with their kernel variants, generating lots of build warnings.
Sync them:
- tools/arch/x86/include/asm/disabled-features.h,
tools/arch/x86/include/asm/required-features.h,
tools/include/linux/hash.h:
Remove the SPDX tag where the kernel version does not have it.
- tools/include/asm-generic/bitops/__fls.h,
tools/include/asm-generic/bitops/arch_hweight.h,
tools/include/asm-generic/bitops/const_hweight.h,
tools/include/asm-generic/bitops/fls.h,
tools/include/asm-generic/bitops/fls64.h,
tools/include/uapi/asm-generic/ioctls.h,
tools/include/uapi/asm-generic/mman-common.h,
tools/include/uapi/sound/asound.h,
tools/include/uapi/linux/kvm.h,
tools/include/uapi/linux/perf_event.h,
tools/include/uapi/linux/sched.h,
tools/include/uapi/linux/vhost.h,
tools/include/uapi/sound/asound.h:
Add the SPDX tag of the respective kernel header.
- tools/include/uapi/linux/bpf_common.h,
tools/include/uapi/linux/fcntl.h,
tools/include/uapi/linux/hw_breakpoint.h,
tools/include/uapi/linux/mman.h,
tools/include/uapi/linux/stat.h,
Change the tag to the kernel header version:
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
Also sync other header details:
- include/uapi/sound/asound.h:
Fix pointless end of line whitespace noise the header grew in this cycle.
- tools/arch/x86/lib/memcpy_64.S:
Sync the code and add tools/include/asm/export.h with dummy wrappers
to support building the kernel side code in a tooling header environment.
- tools/include/uapi/asm-generic/mman.h,
tools/include/uapi/linux/bpf.h:
Sync other details that don't impact tooling's use of the ABIs.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Smooth Cong Wang's bug fix into 'net-next'. Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that SK_REDIRECT is no longer a valid return code. Remove it
from the UAPI completely. Then do a namespace remapping internal
to sockmap so SK_REDIRECT is no longer externally visible.
Patchs primary change is to do a namechange from SK_REDIRECT to
__SK_REDIRECT
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several conflicts here.
NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.
Parallel additions to net/pkt_cls.h and net/sch_generic.h
A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.
The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent additions to support multiple programs in cgroups impose
a strict requirement, "all yes is yes, any no is no". To enforce
this the infrastructure requires the 'no' return code, SK_DROP in
this case, to be 0.
To apply these rules to SK_SKB program types the sk_actions return
codes need to be adjusted.
This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
SK_ABORTED to remove any chance that the API may allow aborted
program flows to be passed up the stack. This would be incorrect
behavior and allow programs to break existing policies.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix multiple build errors and warnings
1.
test_maps.c: In function ‘test_map_rdonly’:
test_maps.c:1051:30: error: ‘BPF_F_RDONLY’ undeclared (first use in this function)
MAP_SIZE, map_flags | BPF_F_RDONLY);
2.
test_maps.c:1048:6: warning: unused variable ‘i’ [-Wunused-variable]
int i, fd, key = 0, value = 0;
3.
test_maps.c:1087:2: error: called object is not a function or function pointer
assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == EPERM);
4.
./bpf_helpers.h:72:11: error: use of undeclared identifier 'BPF_FUNC_getsockopt'
(void *) BPF_FUNC_getsockopt;
Fixes: e043325b30 ("bpf: Add tests for eBPF file mode")
Fixes: 6e71b04a82 ("bpf: Add file mode configuration into bpf maps")
Fixes: cd86d1fd21 ("bpf: Adding helper function bpf_getsockops")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were quite a few overlapping sets of changes here.
Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.
Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly. If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.
In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().
Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.
The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"A little more than usual this time around. Been travelling, so that is
part of it.
Anyways, here are the highlights:
1) Deal with memcontrol races wrt. listener dismantle, from Eric
Dumazet.
2) Handle page allocation failures properly in nfp driver, from Jaku
Kicinski.
3) Fix memory leaks in macsec, from Sabrina Dubroca.
4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.
5) Several fixes in bnxt_en driver, including preventing potential
NVRAM parameter corruption from Michael Chan.
6) Fix for KRACK attacks in wireless, from Johannes Berg.
7) rtnetlink event generation fixes from Xin Long.
8) Deadlock in mlxsw driver, from Ido Schimmel.
9) Disallow arithmetic operations on context pointers in bpf, from
Jakub Kicinski.
10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
Xin Long.
11) Only TCP is supported for sockmap, make that explicit with a
check, from John Fastabend.
12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.
13) Fix panic in packet_getsockopt(), also from Eric Dumazet.
14) Add missing locked in hv_sock layer, from Dexuan Cui.
15) Various aquantia bug fixes, including several statistics handling
cures. From Igor Russkikh et al.
16) Fix arithmetic overflow in devmap code, from John Fastabend.
17) Fix busted socket memory accounting when we get a fault in the tcp
zero copy paths. From Willem de Bruijn.
18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
ipv6: flowlabel: do not leave opt->tot_len with garbage
of_mdio: Fix broken PHY IRQ in case of probe deferral
textsearch: fix typos in library helpers
rxrpc: Don't release call mutex on error pointer
net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
net: stmmac: Fix stmmac_get_rx_hwtstamp()
net: stmmac: Add missing call to dev_kfree_skb()
mlxsw: spectrum_router: Configure TIGCR on init
mlxsw: reg: Add Tunneling IPinIP General Configuration Register
net: ethtool: remove error check for legacy setting transceiver type
soreuseport: fix initialization race
net: bridge: fix returning of vlan range op errors
sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
bpf: add test cases to bpf selftests to cover all access tests
bpf: fix pattern matches for direct packet access
bpf: fix off by one for range markings with L{T, E} patterns
bpf: devmap fix arithmetic overflow in bitmap_size calculation
net: aquantia: Bad udp rate on default interrupt coalescing
net: aquantia: Enable coalescing management via ethtool interface
...
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.
This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'cpumap' is primarily used as a backend map for XDP BPF helper
call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
This patch implement the main part of the map. It is not connected to
the XDP redirect system yet, and no SKB allocation are done yet.
The main concern in this patch is to ensure the datapath can run
without any locking. This adds complexity to the setup and tear-down
procedure, which assumptions are extra carefully documented in the
code comments.
V2:
- make sure array isn't larger than NR_CPUS
- make sure CPUs added is a valid possible CPU
V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
V5:
- Restrict map allocation to root / CAP_SYS_ADMIN
- WARN_ON_ONCE if queue is not empty on tear-down
- Return -EPERM on memlock limit instead of -ENOMEM
- Error code in __cpu_map_entry_alloc() also handle ptr_ring_cleanup()
- Moved cpu_map_enqueue() to next patch
V6: all notice by Daniel Borkmann
- Fix err return code in cpu_map_alloc() introduced in V5
- Move cpu_possible() check after max_entries boundary check
- Forbid usage initially in check_map_func_compatibility()
V7:
- Fix alloc error path spotted by Daniel Borkmann
- Did stress test adding+removing CPUs from the map concurrently
- Fixed refcnt issue on cpu_map_entry, kthread started too soon
- Make sure packets are flushed during tear-down, involved use of
rcu_barrier() and kthread_run only exit after queue is empty
- Fix alloc error path in __cpu_map_entry_alloc() for ptr_ring
V8:
- Nitpicking comments and gramma by Edward Cree
- Fix missing semi-colon introduced in V7 due to rebasing
- Move struct bpf_cpu_map_entry members cpu+map_id to tracepoint patch
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Silences the checker:
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
The 90caccdd8c ("bpf: fix bpf_tail_call() x64 JIT") cset only updated
a comment in uapi/bpf.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-rwx2cqbf0x1lwa1krsr6e6hd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of u8, use char for prog and map name. It can avoid the
userspace tool getting compiler's signess warning. The
bpf_prog_aux, bpf_map, bpf_attr, bpf_prog_info and
bpf_map_info are changed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
tools/include/uapi/linux/bpf.h got out of sync with actual kernel header.
Update it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the libbpf to provide API support to
allow specifying BPF object name.
In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used. Regarding section name, all maps are
under the same section named "maps". Hence, section name
is not a good choice for map's name. To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.
This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.
The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument. For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.
In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code. For bpf_prog, bpf_load.c does
not collect the function symbol name. We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like a couple of updates missed to get carried into tools/include/uapi/,
so copy the bpf.h header as usual to pull in latest updates.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time for a sync with ABI/uapi headers with the upcoming v4.14 kernel.
None of the ABI changes require any source code level changes to our
existing in-kernel tooling code:
- tools/arch/s390/include/uapi/asm/kvm.h:
New KVM_S390_VM_TOD_EXT ABI, not used by in-kernel tooling.
- tools/arch/x86/include/asm/cpufeatures.h:
tools/arch/x86/include/asm/disabled-features.h:
New PCID, SME and VGIF x86 CPU feature bits defined.
- tools/include/asm-generic/hugetlb_encode.h:
tools/include/uapi/asm-generic/mman-common.h:
tools/include/uapi/linux/mman.h:
Two new madvise() flags, plus a hugetlb system call mmap flags
restructuring/extension changes.
- tools/include/uapi/drm/drm.h:
tools/include/uapi/drm/i915_drm.h:
New drm_syncobj_create flags definitions, new drm_syncobj_wait
and drm_syncobj_array ABIs. DRM_I915_PERF_* calls and a new
I915_PARAM_HAS_EXEC_FENCE_ARRAY ABI for the Intel driver.
- tools/include/uapi/linux/bpf.h:
New bpf_sock fields (::mark and ::priority), new XDP_REDIRECT
action, new kvm_ppc_smmu_info fields (::data_keys, instr_keys)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170913073823.lxmi4c7ejqlfabjx@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
In the initial sockmap API we provided strparser and verdict programs
using a single attach command by extending the attach API with a the
attach_bpf_fd2 field.
However, if we add other programs in the future we will be adding a
field for every new possible type, attach_bpf_fd(3,4,..). This
seems a bit clumsy for an API. So lets push the programs using two
new type fields.
BPF_SK_SKB_STREAM_PARSER
BPF_SK_SKB_STREAM_VERDICT
This has the advantage of having a readable name and can easily be
extended in the future.
Updates to samples and sockmap included here also generalize tests
slightly to support upcoming patch for multiple map support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This program binds a program to a cgroup and then matches hard
coded IP addresses and adds these to a sockmap.
This will receive messages from the backend and send them to
the client.
client:X <---> frontend:10000 client:X <---> backend:10001
To keep things simple this is only designed for 1:1 connections
using hard coded values. A more complete example would allow many
backends and clients.
To run,
# sockmap <cgroup2_dir>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.
This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.
Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).
The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)
[1] https://github.com/borkmann/llvm/tree/bpf-insns
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test for xdp_redirect by creating two namespaces with two
veth peers, then forward packets in-between.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 04df41e343 ("bpf: update tools/include/uapi/linux/bpf.h") the files
added in 40304b2a15 ("bpf: BPF support for sock_ops") were added to
tools/include, but not in a verbatim way, missing the comments, which
ends up triggering this warning when build tools/perf/:
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
Make sure the the lines are equal, to fix the simple header copy
drift detector in tools/perf/.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 04df41e343 ("bpf: update tools/include/uapi/linux/bpf.h")
Link: http://lkml.kernel.org/n/tip-z9qyyqht9qq3yyxu76sfy0dh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper") BPF_ADJ_ROOM_NET was
added to include/uapi/linux/bpf.h but BPF_ADJ_ROOM_NET_OPS was added to
tools/include/uapi/linux/bpf.h, making these files differ, fix it by using the
same name in both, BPF_ADJ_ROOM_NET, the one in the kernel headers copy.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper")
Link: http://lkml.kernel.org/n/tip-2bmwovi9lymplyz6wsszppyf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.
Fixes: 546ac1ffb7 "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work adds a helper that can be used to adjust net room of an
skb. The helper is generic and can be further extended in future.
Main use case is for having a programmatic way to add/remove room to
v4/v6 header options along with cls_bpf on egress and ingress hook
of the data path. It reuses most of the infrastructure that we added
for the bpf_skb_change_type() helper which can be used in nat64
translations. Similarly, the helper only takes care of adjusting the
room so that related data is populated and csum adapted out of the
BPF program using it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update tools/include/uapi/linux/bpf.h to include changes related to new
bpf sock_ops program type.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow for tc BPF programs to set a skb->hash, apart from clearing
and triggering a recalc that we have right now. It allows for BPF
to implement a custom hashing routine for skb_get_hash().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test to exercise the bpf_prog/map id generation,
bpf_(prog|map)_get_next_id(), bpf_(prog|map)_get_fd_by_id() and
bpf_get_obj_info_by_fd().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit updates documentation of the bpf_perf_event_output and
bpf_perf_event_read helpers to match their implementation.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new field, "prog_flags", and an initial flag value
BPF_F_STRICT_ALIGNMENT.
When set, the verifier will enforce strict pointer alignment
regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS.
The verifier, in this mode, will also use a fixed value of "2" in
place of NET_IP_ALIGN.
This facilitates test cases that will exercise and validate this part
of the verifier even when run on architectures where alignment doesn't
matter.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
[1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o s/bpf_bpf_get_socket_cookie/bpf_get_socket_cookie
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
add support for BPF_PROG_TEST_RUN command to libbpf.a
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returns the owner uid of the socket inside a sk_buff. This is useful to
perform per-UID accounting of network traffic or per-UID packet
filtering. The socket need to be a fullsock otherwise overflowuid is
returned.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test cases for array of maps and hash of maps.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tools version of this header is out of date; update it to the latest
version from kernel header.
Synchronize with the following commits:
* b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
* a5e8c07059 ("bpf: add bpf_probe_read_str helper")
* d1b662adcd ("bpf: allow option for setting bpf_l4_csum_replace from scratch")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gianluca Borello <g.borello@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tools version of this header is out of date; update it to the latest
version from the kernel headers.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org
[ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 747ea55e4f ("bpf: fix bpf_skb_in_cgroup helper naming") renames
BPF_FUNC_skb_in_cgroup to bpf_skb_under_cgroup, triggering this warning
while building perf:
Warning: tools/include/uapi/linux/bpf.h differs from kernel
Update the copy to ack that, no changes needed, as
BPF_FUNC_skb_in_cgroup isn't used so far.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-x67d2gq8ct6ko12ex14q8bbx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>