Pull networking updates from David Miller:
1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.
2) Use bio_vec in the networking instead of custom skb_frag_t, from
Matthew Wilcox.
3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.
4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.
5) Support all variants of 5750X bnxt_en chips, from Michael Chan.
6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
Buslov.
7) Add TCP syn cookies bpf helper, from Petar Penkov.
8) Add 'nettest' to selftests and use it, from David Ahern.
9) Add extack support to drop_monitor, add packet alert mode and
support for HW drops, from Ido Schimmel.
10) Add VLAN offload to stmmac, from Jose Abreu.
11) Lots of devm_platform_ioremap_resource() conversions, from
YueHaibing.
12) Add IONIC driver, from Shannon Nelson.
13) Several kTLS cleanups, from Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
mlxsw: spectrum: Register CPU port with devlink
mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
net: ena: fix incorrect update of intr_delay_resolution
net: ena: fix retrieval of nonadaptive interrupt moderation intervals
net: ena: fix update of interrupt moderation register
net: ena: remove all old adaptive rx interrupt moderation code from ena_com
net: ena: remove ena_restore_ethtool_params() and relevant fields
net: ena: remove old adaptive interrupt moderation code from ena_netdev
net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
net: ena: enable the interrupt_moderation in driver_supported_features
net: ena: reimplement set/get_coalesce()
net: ena: switch to dim algorithm for rx adaptive interrupt moderation
net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
ethtool: implement Energy Detect Powerdown support via phy-tunable
xen-netfront: do not assume sk_buff_head list is empty in error handling
s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
net: ena: don't wake up tx queue when down
drop_monitor: Better sanitize notified packets
...
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The tep_register_trace_clock() API is used to instruct the traceevent
library how to print the event time stamps. As event print interface if
redesigned, this API is not needed any more. The new event print API is
flexible and the user can specify how the time stamps are printed.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick McLean <chutzpah@gentoo.org>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-3-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.195042846@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Libtraceevent APIs for printing various trace events information are
complicated, there are complex extra parameters. To control the way
event information is printed, the user should call a set of functions in
a specific sequence.
These APIs are reimplemented to provide a more simple interface for
printing event information.
Removed APIs:
tep_print_event_task()
tep_print_event_time()
tep_print_event_data()
tep_event_info()
tep_is_latency_format()
tep_set_latency_format()
tep_data_latency_format()
tep_set_print_raw()
A new API for printing event information is introduced:
void tep_print_event(struct tep_handle *tep, struct trace_seq *s,
struct tep_record *record, const char *fmt, ...);
where "fmt" is a printf-like format string, followed by the event
fields to be printed. Supported fields:
TEP_PRINT_PID, "%d" - event PID
TEP_PRINT_CPU, "%d" - event CPU
TEP_PRINT_COMM, "%s" - event command string
TEP_PRINT_NAME, "%s" - event name
TEP_PRINT_LATENCY, "%s" - event latency
TEP_PRINT_TIME, %d - event time stamp. A divisor and precision
can be specified as part of this format string:
"%precision.divisord". Example:
"%3.1000d" - divide the time by 1000 and print the first 3 digits
before the dot. Thus, the time stamp "123456000" will be printed as
"123.456"
TEP_PRINT_INFO, "%s" - event information.
TEP_PRINT_INFO_RAW, "%s" - event information, in raw format.
Example:
tep_print_event(tep, s, record, "%16s-%-5d [%03d] %s %6.1000d %s %s",
TEP_PRINT_COMM, TEP_PRINT_PID, TEP_PRINT_CPU,
TEP_PRINT_LATENCY, TEP_PRINT_TIME, TEP_PRINT_NAME, TEP_PRINT_INFO);
Output:
ls-11314 [005] d.h. 185207.366383 function __wake_up
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-trace-devel@vger.kernel.org
Cc: Patrick McLean <chutzpah@gentoo.org>
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-2-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.041132030@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.
The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.
Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
While reading a trace data file that had 100,000s of tasks, the process
took an extremely long time. I profiled it down to add_new_comm(), which
was doing a qsort() call on an array that was pretty much already sorted
(all but the last element. qsort() isn't very efficient when dealing
with mostly sorted arrays, and this definitely showed its issues.
When adding a new task to the task list, instead of using qsort(), do
another bsearch() with a function that will find the element before
where the new task will be inserted in. Then simply shift the rest of
the array, and insert the task where it belongs.
Fixes: f7d82350e5 ("tools/events: Add files to create libtraceevent.a")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lkml.kernel.org/r/20190828191820.127233764@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the re-allocation of tep->cmdlines succeeds, then the previous
allocation of tep->cmdlines will be freed. If we later fail in
add_new_comm(), we must not free cmdlines, and also should assign
tep->cmdlines to the new allocation. Otherwise when freeing tep, the
tep->cmdlines will be pointing to garbage.
Fixes: a6d2a61ac6 ("tools lib traceevent: Remove some die() calls")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-trace-devel@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190828191819.970121417@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Drop __NR_mmap2 fork in flavor of LFS, that is _FILE_OFFSET_BITS=64
(glibc & bionic) / LARGEFILE64_SOURCE (for musl) decision. It allows
mmap() to use 64bit offset that is passed to mmap2 syscall. As result
pgoff is not truncated and no need to use direct access to mmap2 for
32 bits systems.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add an API function taking a BTF object id and providing the id of the
next BTF object in the kernel. This can be used to list all BTF objects
loaded on the system.
v2:
- Rebase on top of Andrii's changes regarding libbpf versioning.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation for the introduction of a similar function for retrieving
the id of the next BTF object, consolidate the code from
bpf_prog_get_next_id() and bpf_map_get_next_id() in libbpf.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The tools/lib/traceevent/Makefile had a test added to it to detect a failure
of the "nm" when making the dynamic list file (whatever that is). The
problem is that the test sorts the values "U W w" and some versions of sort
will place "w" ahead of "W" (even though it has a higher ASCII value, and
break the test.
Add 'tr "w" "W"' to merge the two and not worry about the ordering.
Reported-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal rarek <mmarek@suse.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 6467753d61 ("tools lib traceevent: Robustify do_generate_dynamic_list_file")
Link: http://lkml.kernel.org/r/20190805130150.25acfeb1@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit adds support for the new need_wakeup flag in AF_XDP. The
xsk_socket__create function is updated to handle this and a new
function is introduced called xsk_ring_prod__needs_wakeup(). This
function can be used by the application to check if Rx and/or Tx
processing needs to be explicitly woken up.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently libbpf version is specified in 2 places: libbpf.map and
Makefile. They easily get out of sync and it's very easy to update one,
but forget to update another one. In addition, Github projection of
libbpf has to maintain its own version which has to be remembered to be
kept in sync manually, which is very error-prone approach.
This patch makes libbpf.map a source of truth for libbpf version and
uses shell invocation to parse out correct full and major libbpf version
to use during build. Now we need to make sure that once new release
cycle starts, we need to add (initially) empty section to libbpf.map
with correct latest version.
This also will make it possible to keep Github projection consistent
with kernel sources version of libbpf by adopting similar parsing of
version from libbpf.map.
v2->v3:
- grep -o + sort -rV (Andrey);
v1->v2:
- eager version vars evaluation (Jakub);
- simplified version regex (Andrey);
Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):
for (i = 1; i <= btf__get_nr_types(btf); i++) {
t = (struct btf_type *)btf__type_by_id(btf, i);
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
<<<<<<< HEAD
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && kind == BTF_KIND_DATASEC) {
=======
t->size = sizeof(int);
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
} else if (!has_datasec && btf_is_datasec(t)) {
>>>>>>> 72ef80b5ee
/* replace DATASEC with STRUCT */
Conflict is between the two commits 1d4126c4e1 ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc6853c ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:
[...]
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && btf_is_datasec(t)) {
/* replace DATASEC with STRUCT */
[...]
The main changes are:
1) Addition of core parts of compile once - run everywhere (co-re) effort,
that is, relocation of fields offsets in libbpf as well as exposure of
kernel's own BTF via sysfs and loading through libbpf, from Andrii.
More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
and http://vger.kernel.org/lpc-bpf2018.html#session-2
2) Enable passing input flags to the BPF flow dissector to customize parsing
and allowing it to stop early similar to the C based one, from Stanislav.
3) Add a BPF helper function that allows generating SYN cookies from XDP and
tc BPF, from Petar.
4) Add devmap hash-based map type for more flexibility in device lookup for
redirects, from Toke.
5) Improvements to XDP forwarding sample code now utilizing recently enabled
devmap lookups, from Jesper.
6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
and Takshak.
7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.
8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.
9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.
10) Add perf event output helper also for other skb-based program types, from Allan.
11) Fix a co-re related compilation error in selftests, from Yonghong.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux)
as a target BTF. Also extend the list of on disk search paths for
vmlinux ELF image with entries that perf is searching for.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-11
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.
2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.
3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.
4) Fix bpftool to dump an error if pinning fails, from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the core logic for BPF CO-RE offsets relocations.
Every instruction that needs to be relocated has corresponding
bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
to match recorded "local" relocation spec against potentially many
compatible "target" types, creating corresponding spec. Details of the
algorithm are noted in corresponding comments in the code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add support for BPF CO-RE offset relocations. Add section/record
iteration macros for .BTF.ext. These macro are useful for iterating over
each .BTF.ext record, either for dumping out contents or later for BPF
CO-RE relocation handling.
To enable other parts of libbpf to work with .BTF.ext contents, moved
a bunch of type definitions into libbpf_internal.h.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add lots of frequently used helpers that simplify working with BTF
types.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull networking fixes from David Miller:
"Yeah I should have sent a pull request last week, so there is a lot
more here than usual:
1) Fix memory leak in ebtables compat code, from Wenwen Wang.
2) Several kTLS bug fixes from Jakub Kicinski (circular close on
disconnect etc.)
3) Force slave speed check on link state recovery in bonding 802.3ad
mode, from Thomas Falcon.
4) Clear RX descriptor bits before assigning buffers to them in
stmmac, from Jose Abreu.
5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
loops, from Nishka Dasgupta.
6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.
7) Need to hold sock across skb->destructor invocation, from Cong
Wang.
8) IP header length needs to be validated in ipip tunnel xmit, from
Haishuang Yan.
9) Use after free in ip6 tunnel driver, also from Haishuang Yan.
10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
Heiner Kallweit.
11) Upon bridge device init failure, we need to delete the local fdb.
From Nikolay Aleksandrov.
12) Handle erros from of_get_mac_address() properly in stmmac, from
Martin Blumenstingl.
13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
Kadlecsik.
14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
some devices, so revert. From Johannes Berg.
15) Fix deadlock in rxrpc, from David Howells.
16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
Haibing.
17) Fix mvpp2 crash on module removal, from Matteo Croce.
18) Fix race in genphy_update_link, from Heiner Kallweit.
19) bpf_xdp_adjust_head() stopped working with generic XDP when we
fixes generic XDP to support stacked devices properly, fix from
Jesper Dangaard Brouer.
20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
David Ahern.
21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
net: dsa: sja1105: Fix memory leak on meta state machine error path
net: dsa: sja1105: Fix memory leak on meta state machine normal path
net: dsa: sja1105: Really fix panic on unregistering PTP clock
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
net: sched: sample: allow accessing psample_group with rtnl
net: sched: police: allow accessing police->params with rtnl
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
tc-testing: updated vlan action tests with batch create/delete
net sched: update vlan action for batched events operations
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: stmmac: xgmac: Fix XGMAC selftests
be2net: disable bh with spin_lock in be_process_mcc
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
...
5d01ab7bac ("libbpf: fix erroneous multi-closing of BTF FD")
introduced backwards-compatibility issue, manifesting itself as -E2BIG
error returned on program load due to unknown non-zero btf_fd attribute
value for BPF_PROG_LOAD sys_bpf() sub-command.
This patch fixes bug by ensuring that we only ever associate BTF FD with
program if there is a BTF.ext data that was successfully loaded into
kernel, which automatically means kernel supports func_info/line_info
and associated BTF FD for progs (checked and ensured also by BTF
sanitization code).
Fixes: 5d01ab7bac ("libbpf: fix erroneous multi-closing of BTF FD")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Having static variable `cpus` in libbpf_num_possible_cpus function
without guarding it with mutex makes this function thread-unsafe.
If multiple threads accessing this function, in the current form; it
leads to incrementing the static variable value `cpus` in the multiple
of total available CPUs.
Used local stack variable to calculate the number of possible CPUs and
then updated the static variable using WRITE_ONCE().
Changes since v1:
* added stack variable to calculate cpus
* serialized static variable update using WRITE_ONCE()
* fixed Fixes tag
Fixes: 6446b31555 ("bpf: add a new API libbpf_num_possible_cpus()")
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This adds the definition for BPF_MAP_TYPE_DEVMAP_HASH to libbpf_probes.c in
tools/lib/bpf.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
hashmap.h depends on __WORDSIZE being defined. It is defined by
glibc/musl in different headers. It's an explicit goal for musl to be
"non-detectable" at compilation time, so instead include glibc header if
glibc is explicitly detected and fall back to musl header otherwise.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Fixes: e3b9242240 ("libbpf: add resizable non-thread safe internal hashmap")
Link: https://lkml.kernel.org/r/20190718173021.2418606-1-andriin@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By returning previously set print callback from libbpf_set_print, it's
possible to restore it, eventually. This is useful when running many
independent test with one default print function, but overriding log
verbosity for particular subset of tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Libbpf stores associated BTF FD per each instance of bpf_program. When
program is unloaded, that FD is closed. This is wrong, because leads to
a race and possibly closing of unrelated files, if application
simultaneously opens new files while bpf_programs are unloaded.
It's also unnecessary, because struct btf "owns" that FD, and
btf__free(), called from bpf_object__close() will close it. Thus the fix
is to never have per-program BTF FD and fetch it from obj->btf, when
necessary.
Fixes: 2993e0515b ("tools/bpf: add support to read .BTF.ext sections")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-07-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix segfault in libbpf, from Andrii.
2) fix gso_segs access, from Eric.
3) tls/sockmap fixes, from Jakub and John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
GCC8 still complains about *expected* string truncation:
xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
from a string of length 15 [-Werror=stringop-truncation]
strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
This patch gets rid of the issue altogether by using memcpy instead.
There is no performance regression, as strncpy will still copy and fill
all of the bytes anyway.
v1->v2:
- rebase against bpf tree.
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When BPF program defines uninitialized global variable, it's put into
a special COMMON section. Libbpf will reject such programs, but will
provide very unhelpful message with garbage-looking section index.
This patch detects special section cases and gives more explicit error
message.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
'channels.max_combined' initialized only on ioctl success and
errno is only valid on ioctl failure.
The code doesn't produce any runtime issues, but makes memory
sanitizers angry:
Conditional jump or move depends on uninitialised value(s)
at 0x55C056F: xsk_get_max_queues (xsk.c:336)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
Additionally fixed warning on uninitialized bytes in ioctl arguments:
Syscall param ioctl(SIOCETHTOOL) points to uninitialised byte(s)
at 0x648D45B: ioctl (in /usr/lib64/libc-2.28.so)
by 0x55C0546: xsk_get_max_queues (xsk.c:330)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Address 0x1ffefff378 is on thread 1's stack
in frame #1, created by xsk_get_max_queues (xsk.c:318)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As it fails to build in some systems with:
libbpf.c: In function 'perf_buffer__new':
libbpf.c:4515: error: unknown field 'sample_period' specified in initializer
libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer
Doing as:
attr.sample_period = 1;
I.e. not as a designated initializer makes it build everywhere.
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: fb84b82246 ("libbpf: add perf buffer API")
Link: https://lkml.kernel.org/n/tip-hnlmch8qit1ieksfppmr32si@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Using endian.h and its endianness macros makes this code build in a
wider range of compilers, as some don't have those macros
(__BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__),
so use instead endian.h's macros (__BYTE_ORDER, __LITTLE_ENDIAN,
__BIG_ENDIAN) which makes this code even shorter :-)
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 12ef5634a8 ("libbpf: simplify endianness check")
Fixes: e6c64855fd ("libbpf: add btf__parse_elf API to load .BTF and .BTF.ext")
Link: https://lkml.kernel.org/n/tip-eep5n8vgwcdphw3uc058k03u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If VAR in non-sanitized BTF was size less than 4, converting such VAR
into an INT with size=4 will cause BTF validation failure due to
violationg of STRUCT (into which DATASEC was converted) member size.
Fix by conservatively using size=1.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In case when BTF loading fails despite sanitization, but BPF object has
.BTF.ext loaded as well, we free and null obj->btf, but not
obj->btf_ext. This leads to an attempt to relocate .BTF.ext later on
during bpf_object__load(), which assumes obj->btf is present. This leads
to SIGSEGV on null pointer access. Fix bug by freeing and nulling
obj->btf_ext as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull networking fixes from David Miller:
1) Fix AF_XDP cq entry leak, from Ilya Maximets.
2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.
3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.
4) Fix handling of neigh timers wrt. entries added by userspace, from
Lorenzo Bianconi.
5) Various cases of missing of_node_put(), from Nishka Dasgupta.
6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.
7) Various RDS layer fixes, from Gerd Rausch.
8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
Cong Wang.
9) Fix FIB source validation checks over loopback, also from Cong Wang.
10) Use promisc for unsupported number of filters, from Justin Chen.
11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
tcp: fix tcp_set_congestion_control() use from bpf hook
ag71xx: fix return value check in ag71xx_probe()
ag71xx: fix error return code in ag71xx_probe()
usb: qmi_wwan: add D-Link DWM-222 A2 device ID
bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
net: dsa: sja1105: Fix missing unlock on error in sk_buff()
gve: replace kfree with kvfree
selftests/bpf: fix test_xdp_noinline on s390
selftests/bpf: fix "valid read map access into a read-only array 1" on s390
net/mlx5: Replace kfree with kvfree
MAINTAINERS: update netsec driver
ipv6: Unlink sibling route in case of failure
liquidio: Replace vmalloc + memset with vzalloc
udp: Fix typo in net/ipv4/udp.c
net: bcmgenet: use promisc for unsupported filters
ipv6: rt6_check should return NULL if 'from' is NULL
tipc: initialize 'validated' field of received packets
selftests: add a test case for rp_filter
fib: relax source validation check for loopback packets
mlxsw: spectrum: Do not process learned records with a dummy FID
...
Similar issue was fixed in cdfc7f888c ("libbpf: fix GCC8 warning for
strncpy") already. This one was missed. Fixing now.
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Intel PT:
Adrian Hunter:
- Fix DROP VIEW power_events_view in the postgresql and sqlite export-db
python scripts.
perf script:
Song Liu:
- Assume native_arch for pipe mode, fixing a segfault.
perf inject:
Arnaldo Carvalho de Melo:
- The tool->read() call may pass a NULL evsel, handle it.
core:
Arnaldo Carvalho de Melo:
- Move zalloc/zfree.c to tools/lib, further eroding tools/perf/util.[ch]
- Use zfree() where applicable instead of open coded equivalent.
- Add stdlib.h and some other headers to places where its needed and were
getting via util.h, that doesn't need that anymore.
- Use list_del_init() more thoroughly.
Miscellaneous:
Leo Yan:
- Fix use after free and potential NULL pointer derefs detected by the
smatch tool in various places.
Luke Mujica:
- Remove a couple unused variables in the parse-events code.
Numfor Mbiziwo-Tiapo:
- Initialize variable to suppress memory sanitizer warning in the
mmap-thread-lookup 'perf test' entry.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXSTb7wAKCRCyPKLppCJ+
J7zZAQCxJ5N2U/+kSGKU9OPKnvB2XxeBV1l0Zy5OEPkSfNqwxgEA4sFBjr6mwarU
2M6vwp/90ENkhBQkfTnhunVYkLHaMww=
=H6yU
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.3-20190709' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes:
Intel PT:
Adrian Hunter:
- Fix DROP VIEW power_events_view in the postgresql and sqlite export-db
python scripts.
perf script:
Song Liu:
- Assume native_arch for pipe mode, fixing a segfault.
perf inject:
Arnaldo Carvalho de Melo:
- The tool->read() call may pass a NULL evsel, handle it.
core:
Arnaldo Carvalho de Melo:
- Move zalloc/zfree.c to tools/lib, further eroding tools/perf/util.[ch]
- Use zfree() where applicable instead of open coded equivalent.
- Add stdlib.h and some other headers to places where its needed and were
getting via util.h, that doesn't need that anymore.
- Use list_del_init() more thoroughly.
Miscellaneous:
Leo Yan:
- Fix use after free and potential NULL pointer derefs detected by the
smatch tool in various places.
Luke Mujica:
- Remove a couple unused variables in the parse-events code.
Numfor Mbiziwo-Tiapo:
- Initialize variable to suppress memory sanitizer warning in the
mmap-thread-lookup 'perf test' entry.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 32-bit platforms compiler complains about conversion:
libbpf.c: In function ‘perf_event_open_probe’:
libbpf.c:4112:17: error: cast from pointer to integer of different
size [-Werror=pointer-to-int-cast]
attr.config1 = (uint64_t)(void *)name; /* kprobe_func or uprobe_path */
^
Reported-by: Matt Hart <matthew.hart@linaro.org>
Fixes: b265002747 ("libbpf: add kprobe/uprobe attach API")
Tested-by: Matt Hart <matthew.hart@linaro.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking updates from David Miller:
"Some highlights from this development cycle:
1) Big refactoring of ipv6 route and neigh handling to support
nexthop objects configurable as units from userspace. From David
Ahern.
2) Convert explored_states in BPF verifier into a hash table,
significantly decreased state held for programs with bpf2bpf
calls, from Alexei Starovoitov.
3) Implement bpf_send_signal() helper, from Yonghong Song.
4) Various classifier enhancements to mvpp2 driver, from Maxime
Chevallier.
5) Add aRFS support to hns3 driver, from Jian Shen.
6) Fix use after free in inet frags by allocating fqdirs dynamically
and reworking how rhashtable dismantle occurs, from Eric Dumazet.
7) Add act_ctinfo packet classifier action, from Kevin
Darbyshire-Bryant.
8) Add TFO key backup infrastructure, from Jason Baron.
9) Remove several old and unused ISDN drivers, from Arnd Bergmann.
10) Add devlink notifications for flash update status to mlxsw driver,
from Jiri Pirko.
11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.
12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.
13) Various enhancements to ipv6 flow label handling, from Eric
Dumazet and Willem de Bruijn.
14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
der Merwe, and others.
15) Various improvements to axienet driver including converting it to
phylink, from Robert Hancock.
16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.
17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
Radulescu.
18) Add devlink health reporting to mlx5, from Moshe Shemesh.
19) Convert stmmac over to phylink, from Jose Abreu.
20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
Shalom Toledo.
21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.
22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.
23) Track spill/fill of constants in BPF verifier, from Alexei
Starovoitov.
24) Support bounded loops in BPF, from Alexei Starovoitov.
25) Various page_pool API fixes and improvements, from Jesper Dangaard
Brouer.
26) Just like ipv4, support ref-countless ipv6 route handling. From
Wei Wang.
27) Support VLAN offloading in aquantia driver, from Igor Russkikh.
28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.
29) Add flower GRE encap/decap support to nfp driver, from Pieter
Jansen van Vuuren.
30) Protect against stack overflow when using act_mirred, from John
Hurley.
31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.
32) Use page_pool API in netsec driver, Ilias Apalodimas.
33) Add Google gve network driver, from Catherine Sullivan.
34) More indirect call avoidance, from Paolo Abeni.
35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.
36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.
37) Add MPLS manipulation actions to TC, from John Hurley.
38) Add sending a packet to connection tracking from TC actions, and
then allow flower classifier matching on conntrack state. From
Paul Blakey.
39) Netfilter hw offload support, from Pablo Neira Ayuso"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
net/mlx5e: Return in default case statement in tx_post_resync_params
mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
net: dsa: add support for BRIDGE_MROUTER attribute
pkt_sched: Include const.h
net: netsec: remove static declaration for netsec_set_tx_de()
net: netsec: remove superfluous if statement
netfilter: nf_tables: add hardware offload support
net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
net: flow_offload: add flow_block_cb_is_busy() and use it
net: sched: remove tcf block API
drivers: net: use flow block API
net: sched: use flow block API
net: flow_offload: add flow_block_cb_{priv, incref, decref}()
net: flow_offload: add list handling functions
net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
net: flow_offload: add flow_block_cb_setup_simple()
net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
...
Eroding a bit more the tools/perf/util/util.h hodpodge header.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_buffer "object" is part of libbpf API now, add it to the list of
libbpf function prefixes.
Suggested-by: Daniel Borkman <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For BPF_MAP_TYPE_PERF_EVENT_ARRAY typically correct size is number of
possible CPUs. This is impossible to specify at compilation time. This
change adds automatic setting of PERF_EVENT_ARRAY size to number of
system CPUs, unless non-zero size is specified explicitly. This allows
to adjust size for advanced specific cases, while providing convenient
and logical defaults.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
to user space for additional processing. libbpf already has very low-level API
to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
use and requires a lot of code to set everything up. This patch adds
perf_buffer abstraction on top of it, abstracting setting up and polling
per-CPU logic into simple and convenient API, similar to what BCC provides.
perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
map entries. It accepts two user-provided callbacks: one for handling raw
samples and one for get notifications of lost samples due to buffer overflow.
perf_buffer__new_raw() is similar, but provides more control over how
perf events are set up (by accepting user-provided perf_event_attr), how
they are handled (perf_event_header pointer is passed directly to
user-provided callback), and on which CPUs ring buffers are created
(it's possible to provide a list of CPUs and corresponding map keys to
update). This API allows advanced users fuller control.
perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
utilizing epoll instance.
perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
All APIs are not thread-safe. User should ensure proper locking/coordination if
used in multi-threaded set up.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Change BTF-defined map definitions to capture compile-time integer
values as part of BTF type definition, to avoid split of key/value type
information and actual type/size/flags initialization for maps.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a wrapper utilizing bpf_link "infrastructure" to allow attaching BPF
programs to raw tracepoints.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow attaching BPF programs to kernel tracepoint BPF hooks specified by
category and name.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>