Commit Graph

15717 Commits

Author SHA1 Message Date
Ido Schimmel 94d302deae selftests: mlxsw: Add a test for VxLAN flooding
The device stores flood records in a singly linked list where each
record stores up to three IPv4 addresses of remote VTEPs. The test
verifies that packets are correctly flooded in various cases such as
deletion of a record in the middle of the list.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Ido Schimmel 99c9b084f0 selftests: mlxsw: Add a test for VxLAN configuration
Test various aspects of VxLAN offloading which are specific to mlxsw,
such as sanitization of invalid configurations and offload indication.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata 3485f87cb7 selftests: forwarding: vxlan_bridge_1d_port_8472: New test
This simple wrapper reruns the VXLAN ping test with a port number of
8472.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata a0b61f3d8e selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test
Test that when decapsulating from VXLAN, the values of inner and outer
TOS are handled appropriately. Because VXLAN driver on its own won't
produce the arbitrary TOS combinations necessary to test this feature,
simply open-code a single ICMP packet and have mausezahn assemble it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata 1e5abfb3ff selftests: forwarding: vxlan_bridge_1d: Add an ECN encap test
Test that ECN bits in the VXLAN envelope are correctly deduced from the
overlay packet.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata d417ecf533 selftests: forwarding: vxlan_bridge_1d: Add a TOS test
Test that TOS is inherited from the tunneled packet into the envelope as
configured at the VXLAN device.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata b3a7ee74ee selftests: forwarding: vxlan_bridge_1d: Add a TTL test
This tests whether TTL of VXLAN envelope packets is properly set based
on the device configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata 50a02b0825 selftests: forwarding: vxlan_bridge_1d: Reconfigure & rerun tests
The ordering of the topology creation can have impact on whether a
driver is successful in offloading VXLAN. Therefore add a pseudo-test
that reshuffles bits of the topology, and then reruns the same suite of
tests again to make sure that the new setup is supported as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata bfd1e27038 selftests: forwarding: vxlan_bridge_1d: Add unicast test
Test that when sending traffic to a learned MAC address, the traffic is
forwarded accurately only to the right endpoint.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata edaa117efe selftests: forwarding: vxlan_bridge_1d: Add flood test
Test that when sending traffic to an unlearned MAC address, the traffic
is flooded to both remote VXLAN endpoints.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata 5852fd07c4 selftests: forwarding: vxlan_bridge_1d: Add ping test
Test end-to-end reachability between local and remote endpoints.

Note that because learning is disabled on the VXLAN device, the ICMP
requests will end up being flooded to all remotes.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:44 -08:00
Petr Machata fd64d5a2e3 selftests: forwarding: Add a skeleton of vxlan_bridge_1d
This skeleton sets up a topology with three VXLAN endpoints: one
"local", possibly offloaded, and two "remote", formed using veth pairs
and likely purely software bridges. The "local" endpoint is connected to
host systems by a VLAN-unaware bridge.

Since VXLAN tunnels must be unique per namespace, each of the "remote"
endpoints is in its own namespace. H3 forms the bridge between the three
domains.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Petr Machata d1038cd0f6 selftests: forwarding: lib: Add link_stats_rx_errors_get()
Such a function will be useful for counting malformed packets in the ECN
decap test.

To that end, introduce a common handler for handling stat-fetching, and
reuse it in link_stats_tx_packets_get() and link_stats_rx_errors_get().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Petr Machata d20b0f214a selftests: forwarding: ping{6, }_do(): Allow passing ping arguments
Make the ping routine more generic by allowing passing arbitrary ping
command-line arguments.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Petr Machata 58c7a2d19e selftests: forwarding: ping{6, }_test(): Add description argument
Have ping_test() recognize an optional argument with a description of
the test. This is handy if there are several ping test, to make it clear
which is which.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Petr Machata d0540d1706 selftests: forwarding: lib: Add in_ns()
In order to run a certain command inside another network namespace, it's
possible to use "ip netns exec ns command". However then one can't use
functions defined in lib.sh or a test suite.

One option is to do "ip netns exec ns bash -c command", provided that
all functions that one wishes to use (and their dependencies) are
published using "export -f". That may not be practical.

Therefore, introduce a helper in_ns(), which wraps a given command in a
boilerplate of "ip netns exec" and "source lib.sh", thus making all
library functions available. (Custom functions that a script wishes to
run within a namespace still need to be exported.)

Because quotes in "$@" aren't recognized in heredoc, hand-expand the
array in an explicit for loop, leveraging printf %q to handle proper
quoting.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Petr Machata 601bc1c139 selftests: forwarding: lib: Support NUM_NETIFS of 0
So far the case of NUM_NETIFS of 0 has not been interesting. However if
one wishes to reuse the lib.sh routines in a setup of a separate
namespace, being able to import like this is handy.

Therefore replace the {1..$NUM_NETIFS} references, which cause iteration
over 1 and 0, with an explicit for loop like we do in setup_wait() and
tc_offload_check(), so that for NUM_NETIFS of 0 no iteration is done.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 17:59:43 -08:00
Lorenz Bauer bf5d68c730 tools: add selftest for BPF_F_ZERO_SEED
Check that iterating two separate hash maps produces the same
order of keys if BPF_F_ZERO_SEED is used.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:53:40 +01:00
Lorenz Bauer 608114e441 tools: sync linux/bpf.h
Synchronize changes to linux/bpf.h from
* "bpf: allow zero-initializing hash map seed"
* "bpf: move BPF_F_QUERY_EFFECTIVE after map flags"

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:53:40 +01:00
Stanislav Fomichev 23499442c3 bpf: libbpf: retry map creation without the name
Since commit 88cda1c9da ("bpf: libbpf: Provide basic API support
to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
for maps. Pre v4.14 kernels don't know about map names and return an
error about unexpected non-zero data. Retry sys_bpf without a map
name to cover older kernels.

v2 changes:
* check for errno == EINVAL as suggested by Daniel Borkmann

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:49:32 +01:00
David S. Miller f2be6d710d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
Linus Torvalds f2ce1065e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix some potentially uninitialized variables and use-after-free in
    kvaser_usb can drier, from Jimmy Assarsson.

 2) Fix leaks in qed driver, from Denis Bolotin.

 3) Socket leak in l2tp, from Xin Long.

 4) RSS context allocation fix in bnxt_en from Michael Chan.

 5) Fix cxgb4 build errors, from Ganesh Goudar.

 6) Route leaks in ipv6 when removing exceptions, from Xin Long.

 7) Memory leak in IDR allocation handling of act_pedit, from Davide
    Caratti.

 8) Use-after-free of bridge vlan stats, from Nikolay Aleksandrov.

 9) When MTU is locked, do not force DF bit on ipv4 tunnels. From
    Sabrina Dubroca.

10) When NAPI cached skb is reused, we must set it to the proper initial
    state which includes skb->pkt_type. From Eric Dumazet.

11) Lockdep and non-linear SKB handling fix in tipc from Jon Maloy.

12) Set RX queue properly in various tuntap receive paths, from Matthew
    Cover.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tuntap: fix multiqueue rx
  ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF
  tipc: don't assume linear buffer when reading ancillary data
  tipc: fix lockdep warning when reinitilaizing sockets
  net-gro: reset skb->pkt_type in napi_reuse_skb()
  tc-testing: tdc.py: Guard against lack of returncode in executed command
  tc-testing: tdc.py: ignore errors when decoding stdout/stderr
  ip_tunnel: don't force DF when MTU is locked
  MAINTAINERS: Add entry for CAKE qdisc
  net: bridge: fix vlan stats use-after-free on destruction
  socket: do a generic_file_splice_read when proto_ops has no splice_read
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
  Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs"
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
  net/sched: act_pedit: fix memory leak when IDR allocation fails
  net: lantiq: Fix returned value in case of error in 'xrx200_probe()'
  ipv6: fix a dst leak when removing its exception
  net: mvneta: Don't advertise 2.5G modes
  drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo
  net/mlx4: Fix UBSAN warning of signed integer overflow
  ...
2018-11-19 09:24:04 -08:00
Linus Torvalds 25e19c1fe4 libnvdimm 4.20-rc3
- Address Range Scrub overflow continuation handling has been broken
   since it was initially merged. It was only recently that error injection
   and platform-BIOS support enabled this corner case to be exercised.
 
 - The recent attempt to provide more isolation for the kernel Address
   Range Scrub state machine from userapace initiated sessions triggers a
   lockdep report. Revert and try again at the next merge window.
 
 - Fix a kasan reported buffer overflow in libnvdimm unit test
   infrastrucutre (nfit_test)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb8MdUAAoJEB7SkWpmfYgCifYP/A+OQ19HybqcY2nfvqXUdQum
 Q5x3qcKmGmEbKbnCUMOHZJpEjW4c/Cpm6OKhuFDQJ4tijn1XG3/ATSi7PXrZxs/o
 CRK8MIg5Wz/mMvYRvypIkCHHHr9+Y1NjqmQynM4LLzNG24GMXaeHHuZUTnrZCDmu
 0+jBTylNgVYdykoIxgHDYDB+cd6w4NtAP5OD9D46pdsmzX9ac+OQyZMyNB3glUhd
 /ZFAoywVNfvfJVWEci9RoHiKttWxgVoCuNbSlCs2Y6ymepA44ApR9AgLHtaC9pFO
 DrPkfCzPSmf4PVSxLJd79+/sw9YOcBD7LZ5IxzozxRMuRn5pIofdZIsBg9PlwT5B
 NL9jQK87XPiG0vNxhJu3wzP+FlyCXxGxkWfApp7w4rlWBV7RgugOZHyH051rdKzQ
 44JAPzLLCfA5Mj4o2tIbSx42f2JNX93XDEX8fkUB+qs3GzyOcMtlcmz9UjmnrT0R
 o9KHKhDn81Vivxh33Ts2G0iHktO83XSUBDWApSd6erjEUXMsCLY0D8y+nDGTOMUh
 kVcY8q93sgZGLVbcxt0eGc8Q7osZYawQGRGucflTETFcxNwMyLL4F9lWgPirGeYF
 i1JDWeTrhcImYufNj8o78LsbT5xh6YjbZZ8Q1obIgPXpDtxHNIXO6COId49Zp2cK
 obftWyVp+7kYe79NWzmD
 =sfNx
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A small batch of fixes for v4.20-rc3.

  The overflow continuation fix addresses something that has been broken
  for several releases. Arguably it could wait even longer, but it's a
  one line fix and this finishes the last of the known address range
  scrub bug reports. The revert addresses a lockdep regression. The unit
  tests are not critical to fix, but no reason to hold this fix back.

  Summary:

   - Address Range Scrub overflow continuation handling has been broken
     since it was initially merged. It was only recently that error
     injection and platform-BIOS support enabled this corner case to be
     exercised.

   - The recent attempt to provide more isolation for the kernel Address
     Range Scrub state machine from userapace initiated sessions
     triggers a lockdep report. Revert and try again at the next merge
     window.

   - Fix a kasan reported buffer overflow in libnvdimm unit test
     infrastrucutre (nfit_test)"

* tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  Revert "acpi, nfit: Further restrict userspace ARS start requests"
  acpi, nfit: Fix ARS overflow continuation
  tools/testing/nvdimm: Fix the array size for dimm devices.
2018-11-18 12:21:09 -08:00
Brenda J. Butler c6cecf4ae4 tc-testing: tdc.py: Guard against lack of returncode in executed command
Add some defensive coding in case one of the subprocesses created by tdc
returns nothing. If no object is returned from exec_cmd, then tdc will
halt with an unhandled exception.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 21:54:53 -08:00
Lucas Bates 5aaf642852 tc-testing: tdc.py: ignore errors when decoding stdout/stderr
Prevent exceptions from being raised while decoding output
from an executed command. There is no impact on tdc's
execution and the verify command phase would fail the pattern
match.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 21:54:53 -08:00
Paolo Abeni 9c549a6b05 selftests: add explicit test for multiple concurrent GRO sockets
This covers for proper accounting of encap needed static keys

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:03:20 -08:00
Edward Cree afd5942408 bpf: fix off-by-one error in adjust_subprog_starts
When patching in a new sequence for the first insn of a subprog, the start
 of that subprog does not change (it's the first insn of the sequence), so
 adjust_subprog_starts should check start <= off (rather than < off).
Also added a test to test_verifier.c (it's essentially the syz reproducer).

Fixes: cc8b0b92a1 ("bpf: introduce function calls (function boundaries)")
Reported-by: syzbot+4fc427c7af994b0948be@syzkaller.appspotmail.com
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 21:10:00 -08:00
Stanislav Fomichev 29a9c10e41 bpftool: make libbfd optional
Make it possible to build bpftool without libbfd. libbfd and libopcodes are
typically provided in dev/dbg packages (binutils-dev in debian) which we
usually don't have installed on the fleet machines and we'd like a way to have
bpftool version that works without installing any additional packages.
This excludes support for disassembling jit-ted code and prints an error if
the user tries to use these features.

Tested by:
cat > FEATURES_DUMP.bpftool <<EOF
feature-libbfd=0
feature-disassembler-four-args=1
feature-reallocarray=0
feature-libelf=1
feature-libelf-mmap=1
feature-bpf=1
EOF
FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
ldd bpftool | grep libbfd

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 20:45:01 -08:00
Andrey Ignatov 9108e3a023 selftest/bpf: Use bpf_sk_lookup_{tcp, udp} in test_sock_addr
Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from
test_sock_addr programs to make sure they're available and can lookup
and release socket properly for IPv4/IPv4, TCP/UDP.

Reading from a few fields of returned struct bpf_sock is also tested.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 17:54:29 -08:00
Martin KaFai Lau a83d6e76a6 bpf: libbpf: Fix bpf_program__next() API
This patch restores the behavior in
commit eac7d84519 ("tools: libbpf: don't return '.text' as a program for multi-function programs")
such that bpf_program__next() does not return pseudo programs in ".text".

Fixes: 0c19a9fbc9 ("libbpf: cleanup after partial failure in bpf_object__pin")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 17:46:54 -08:00
Joe Stringer 5c86d2125b selftests/bpf: Fix uninitialized duration warning
Daniel Borkmann reports:

test_progs.c: In function ‘main’:
test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
   ^~~~~~
test_progs.c:1706:8: note: ‘duration’ was declared here
  __u32 duration;
        ^~~~~~~~

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 17:40:10 -08:00
Linus Torvalds ef268de197 powerpc fixes for 4.20 #3
Two weeks worth of fixes since rc1.
 
  - I broke 16-byte alignment of the stack when we moved PPR into pt_regs.
    Despite being required by the ABI this broke almost nothing, we eventually
    hit it in code where GCC does arithmetic on the stack pointer assuming the
    bottom 4 bits are clear. Fix it by padding the in-kernel pt_regs by 8 bytes.
 
  - A couple of commits fixing minor bugs in the recent SLB rewrite.
 
  - A build fix related to tracepoints in KVM in some configurations.
 
  - Our old "IO workarounds" code written for Cell couldn't coexist in a kernel
    that runs on Power9 with the Radix MMU, fix that.
 
  - Remove the NPU DMA ops, these just printed a warning and should never have
    been called.
 
  - Suppress an overly chatty message triggered by CPU hotplug in some configs.
 
  - Two small selftest fixes.
 
 Thanks to:
   Alistair Popple, Gustavo Romero, Nicholas Piggin, Satheesh Rajendran, Scott Wood.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb7qyjAAoJEFHr6jzI4aWA73kP/3x+vghoHthxrazbN+9Z0bWQ
 +c5fHobHLuREjzLhy73lbCE9NOhZTlmdgfAB/MRWW9aVIzHViuhUjRLpLqw0+LBA
 mWIrVloJeMcbupE+zFnc6qh1WY/YZ8lsPZxmb5YSqDSxtcdh8JzDK+RgWn9XkiFa
 sjppaZoLLf/Wxz4VT4v75o8WXEFavpbEaS2PLdWhwT1//H4QpKYWY80tPCijdRhp
 0susCzObBfdwxS4qlwmLBmCxbGhqLzBg1vnPPGq6GypRELIeqR+jHWOjzYmmvQRh
 hLffVTaHIVFgO9c4ruCFmMsJCA1hf186w/62IHXLgOfp7eQJYPQYCn7uVVTWoVDC
 5hpPR71xOkovLJbipk07lshj/kVJQVapCbyGtOz8DKgQnAWcSq23HMuJBHwSEnKH
 xtIk6iupik+7gWjDDY0Yz0xiCmZLRS5heWNAJgXpPFNMSR47EkmU4bZpMrWwbetf
 CashFhbYcFzpPaP7bHgxLk79fhUitkwCFFlSQK3Hj4bog+U2KmKnKgLphn8If8jy
 iHuggxvxCwDvfn+d9X29VJu7Bxz4M5l0ouEp0+hWAockGCsaDxXNuzJNMk1V5GyO
 Ytg6UtGBxLwKBKLqNeXmuY0/CD1lALU5mD/KSIKDjm7skM7T0U3s8T+7/GpFqQ7J
 uQ9aA49aWy4kUy+xCay5
 =fEqA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Two weeks worth of fixes since rc1.

   - I broke 16-byte alignment of the stack when we moved PPR into
     pt_regs. Despite being required by the ABI this broke almost
     nothing, we eventually hit it in code where GCC does arithmetic on
     the stack pointer assuming the bottom 4 bits are clear. Fix it by
     padding the in-kernel pt_regs by 8 bytes.

   - A couple of commits fixing minor bugs in the recent SLB rewrite.

   - A build fix related to tracepoints in KVM in some configurations.

   - Our old "IO workarounds" code written for Cell couldn't coexist in
     a kernel that runs on Power9 with the Radix MMU, fix that.

   - Remove the NPU DMA ops, these just printed a warning and should
     never have been called.

   - Suppress an overly chatty message triggered by CPU hotplug in some
     configs.

   - Two small selftest fixes.

  Thanks to: Alistair Popple, Gustavo Romero, Nicholas Piggin, Satheesh
  Rajendran, Scott Wood"

* tag 'powerpc-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Adjust wild_bctr to build with old binutils
  powerpc/64: Fix kernel stack 16-byte alignment
  powerpc/numa: Suppress "VPHN is not supported" messages
  selftests/powerpc: Fix wild_bctr test to work on ppc64
  powerpc/io: Fix the IO workarounds code to work with Radix
  powerpc/mm/64s: Fix preempt warning in slb_allocate_kernel()
  KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE
  powerpc/mm/64s: Only use slbfee on CPUs that support it
  powerpc/mm/64s: Use PPC_SLBFEE macro
  powerpc/mm/64s: Consolidate SLB assertions
  powerpc/powernv/npu: Remove NPU DMA ops
2018-11-16 10:14:54 -06:00
Jiri Pirko 3b423271b8 selftests: mlxsw: spectrum-2: Add simple delta test
Track the basic codepaths of delta handling, using objagg tracepoints.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko 7dc5a0eeea selftests: Adjust spectrum-2 ctcam_two_atcam_masks_test
In order for this to behave as required with delta bits, change the mask
for rule with handle 103.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko 36107c485f selftests: Adjust spectrum-2 two_mask_test
In order for this to behave as required with delta bits, change the mask
for rule with handle 103.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Gustavo Romero b2fed34a62 selftests/powerpc: Adjust wild_bctr to build with old binutils
Currently the selftest wild_bctr can fail to build when an old gcc is
used, notably on gcc using a binutils version <= 2.27, because the
assembler does not support the integer suffix UL.

This patch adjusts the wild_bctr test so the REG_POISON value is still
treated as an unsigned long for the shifts on compilation but the UL
suffix is absent on the stringification, so the inline asm code
generated has no UL suffixes.

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
[mpe: Wrap long line]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-15 23:05:17 +11:00
Florian Westphal 25d8bcedbf selftests: add script to stress-test nft packet path vs. control plane
Start flood ping for each cpu while loading/flushing rulesets to make
sure we do not access already-free'd rules from nf_tables evaluation loop.

Also add this to TARGETS so 'make run_tests' in selftest dir runs it
automatically.

This would have caught the bug fixed in previous change
("netfilter: nf_tables: do not skip inactive chains during generation update")
sooner.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12 16:13:35 +01:00
Michael Ellerman 2c7645b0f7 selftests/powerpc: Fix wild_bctr test to work on ppc64
The selftest I recently added to test branching to an out-of-bounds
NIP doesn't work on 64-bit big endian. It does fail but not in the
right way. That is it SEGVs trying to load from the opd at BAD_NIP,
but it never gets as far as branching to BAD_NIP.

To fix it we need to create an opd which is reachable but which holds
the bad address.

Fixes: b7683fc66e ("selftests/powerpc: Add a test of wild bctr")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-12 14:47:54 +11:00
David S. Miller 2b9b7502df Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-11 17:57:54 -08:00
Andrey Ignatov e7605475f5 selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr
Add more test cases for context bpf_sock_addr to test narrow loads with
offset > 0 for ctx->user_ip4 field (__u32):
* off=1, size=1;
* off=2, size=1;
* off=3, size=1;
* off=2, size=2.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 22:29:59 -08:00
Andrey Ignatov 6c2afb674d selftests/bpf: Test narrow loads with off > 0 in test_verifier
Test the following narrow loads in test_verifier for context __sk_buff:
* off=1, size=1 - ok;
* off=2, size=1 - ok;
* off=3, size=1 - ok;
* off=0, size=2 - ok;
* off=1, size=2 - fail;
* off=0, size=2 - ok;
* off=3, size=2 - fail.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 22:29:59 -08:00
Stanislav Fomichev 092f089273 bpftool: support loading flow dissector
This commit adds support for loading/attaching/detaching flow
dissector program.

When `bpftool loadall` is called with a flow_dissector prog (i.e. when the
'type flow_dissector' argument is passed), we load and pin all programs.
User is responsible to construct the jump table for the tail calls.

The last argument of `bpftool attach` is made optional for this use
case.

Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
        /sys/fs/bpf/flow type flow_dissector \
	pinmaps /sys/fs/bpf/flow

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 0 0 0 0 \
        value pinned /sys/fs/bpf/flow/IP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 1 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 2 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6OP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 3 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6FR

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 4 0 0 0 \
        value pinned /sys/fs/bpf/flow/MPLS

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 5 0 0 0 \
        value pinned /sys/fs/bpf/flow/VLAN

bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector

Tested by using the above lines to load the prog in
the test_flow_dissector.sh selftest.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:11 -08:00
Stanislav Fomichev 3767a94b32 bpftool: add pinmaps argument to the load/loadall
This new additional argument lets users pin all maps from the object at
specified path.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:11 -08:00
Stanislav Fomichev 77380998d9 bpftool: add loadall command
This patch adds new *loadall* command which slightly differs from the
existing *load*. *load* command loads all programs from the obj file,
but pins only the first programs. *loadall* pins all programs from the
obj file under specified directory.

The intended usecase is flow_dissector, where we want to load a bunch
of progs, pin them all and after that construct a jump table.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:11 -08:00
Stanislav Fomichev 33a2c75c55 libbpf: add internal pin_name
pin_name is the same as section_name where '/' is replaced
by '_'. bpf_object__pin_programs is converted to use pin_name
to avoid the situation where section_name would require creating another
subdirectory for a pin (as, for example, when calling bpf_object__pin_programs
for programs in sections like "cgroup/connect6").

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:11 -08:00
Stanislav Fomichev fd734c5cca libbpf: bpf_program__pin: add special case for instances.nr == 1
When bpf_program has only one instance, don't create a subdirectory with
per-instance pin files (<prog>/0). Instead, just create a single pin file
for that single instance. This simplifies object pinning by not creating
unnecessary subdirectories.

This can potentially break existing users that depend on the case
where '/0' is always created. However, I couldn't find any serious
usage of bpf_program__pin inside the kernel tree and I suppose there
should be none outside.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:10 -08:00
Stanislav Fomichev 0c19a9fbc9 libbpf: cleanup after partial failure in bpf_object__pin
bpftool will use bpf_object__pin in the next commits to pin all programs
and maps from the file; in case of a partial failure, we need to get
back to the clean state (undo previous program/map pins).

As part of a cleanup, I've added and exported separate routines to
pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs)
of an object.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:10 -08:00
Stanislav Fomichev 108d50a976 selftests/bpf: rename flow dissector section to flow_dissector
Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:56:10 -08:00
Li Zhijian da85d8bfd1 kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists
At commit deee2cae27 ("kselftests/bpf: use ping6 as the default ipv6 ping
binary if it exists"), it fixed similar issues for shell script, but it
missed a same issue in the C code.

Fixes: 371e4fcc9d ("selftests/bpf: cgroup local storage-based network counters")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
CC: Philip Li <philip.li@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 10:55:09 +01:00
David Ahern bf598a8f0f bpftool: Improve handling of ENOENT on map dumps
bpftool output is not user friendly when dumping a map with only a few
populated entries:

    $ bpftool map
    1: devmap  name tx_devmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B
    2: array  name tx_idxmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B

    $ bpftool map dump id 1
    key:
    00 00 00 00
    value:
    No such file or directory
    key:
    01 00 00 00
    value:
    No such file or directory
    key:
    02 00 00 00
    value:
    No such file or directory
    key: 03 00 00 00  value: 03 00 00 00

Handle ENOENT by keeping the line format sane and dumping
"<no entry>" for the value

    $ bpftool map dump id 1
    key: 00 00 00 00  value: <no entry>
    key: 01 00 00 00  value: <no entry>
    key: 02 00 00 00  value: <no entry>
    key: 03 00 00 00  value: 03 00 00 00
    ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 10:03:59 +01:00
Sowmini Varadhan 435f90a338 selftests/bpf: add a test case for sock_ops perf-event notification
This patch provides a tcp_bpf based eBPF sample. The test

- ncat(1) as the TCP client program to connect() to a port
  with the intention of triggerring SYN retransmissions: we
  first install an iptables DROP rule to make sure ncat SYNs are
  resent (instead of aborting instantly after a TCP RST)

- has a bpf kernel module that sends a perf-event notification for
  each TCP retransmit, and also tracks the number of such notifications
  sent in the global_map

The test passes when the number of event notifications intercepted
in user-space matches the value in the global_map.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 09:40:17 +01:00
Quentin Monnet f98e46a251 tools: bpftool: update references to other man pages in documentation
Update references to other bpftool man pages at the bottom of each
manual page. Also reference the "bpf(2)" and "bpf-helpers(7)" man pages.

References are sorted by number of man section, then by
"prog-and-map-go-first", the other pages in alphabetical order.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 08:20:52 +01:00
Quentin Monnet f120919f99 tools: bpftool: pass an argument to silence open_obj_pinned()
Function open_obj_pinned() prints error messages when it fails to open a
link in the BPF virtual file system. However, in some occasions it is
not desirable to print an error, for example when we parse all links
under the bpffs root, and the error is due to some paths actually being
symbolic links.

Example output:

    # ls -l /sys/fs/bpf/
    lrwxrwxrwx 1 root root 0 Oct 18 19:00 ip -> /sys/fs/bpf/tc/
    drwx------ 3 root root 0 Oct 18 19:00 tc
    lrwxrwxrwx 1 root root 0 Oct 18 19:00 xdp -> /sys/fs/bpf/tc/

    # bpftool --bpffs prog show
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    Error: bpf obj get (/sys/fs/bpf): Permission denied

    # strace -e bpf bpftool --bpffs prog show
    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/ip", bpf_fd=0}, 72) = -1 EACCES (Permission denied)
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/xdp", bpf_fd=0}, 72) = -1 EACCES (Permission denied)
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    ...

To fix it, pass a bool as a second argument to the function, and prevent
it from printing an error when the argument is set to true.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 08:20:52 +01:00
Quentin Monnet a8bfd2bc29 tools: bpftool: fix plain output and doc for --bpffs option
Edit the documentation of the -f|--bpffs option to make it explicit that
it dumps paths of pinned programs when bpftool is used to list the
programs only, so that users do not believe they will see the name of
the newly pinned program with "bpftool prog pin" or "bpftool prog load".

Also fix the plain output: do not add a blank line after each program
block, in order to remain consistent with what bpftool does when the
option is not passed.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 08:20:52 +01:00
Quentin Monnet 53909030aa tools: bpftool: prevent infinite loop in get_fdinfo()
Function getline() returns -1 on failure to read a line, thus creating
an infinite loop in get_fdinfo() if the key is not found. Fix it by
calling the function only as long as we get a strictly positive return
value.

Found by copying the code for a key which is not always present...

Fixes: 71bb428fe2 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 08:20:52 +01:00
Yonghong Song 49a249c387 tools/bpftool: copy a few net uapi headers to tools directory
Commit f6f3bac08f ("tools/bpf: bpftool: add net support")
added certain networking support to bpftool.
The implementation relies on a relatively recent uapi header file
linux/tc_act/tc_bpf.h on the host which contains the marco
definition of TCA_ACT_BPF_ID.

Unfortunately, this is not the case for all distributions.
See the email message below where rhel-7.2 does not have
an up-to-date linux/tc_act/tc_bpf.h.
  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1799211.html
Further investigation found that linux/pkt_cls.h is also needed for macro
TCA_BPF_TAG.

This patch fixed the issue by copying linux/tc_act/tc_bpf.h
and linux/pkt_cls.h from kernel include/uapi directory to
tools/include/uapi directory so building the bpftool does not depend
on host system for these files.

Fixes: f6f3bac08f ("tools/bpf: bpftool: add net support")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 08:18:31 +01:00
Stefano Brivio 56fd865f46 selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests
Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload,
on IPv4 and IPv6 transport, that check that PMTU exceptions are created
with the right value when exceeding the MTU on a link of the path.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio ce7336610c selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio 582888792f selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Change all occurrences of VxLAN to VXLAN (Jiri Benc)
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Paolo Abeni 3327a9c463 selftests: add functionals test for UDP GRO
Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

rfc v3 -> v1:
 - use ip route to attach the xdp helper to the veth

rfc v2 -> rfc v3:
 - add missing test program options documentation
 - fix sporatic test failures (receiver faster than sender)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni e87f53b4fa selftests: add some benchmark for UDP GRO
Run on top of veth pair, using a dummy XDP program to enable the GRO.

 rfc v3 -> v1:
  - use ip route to attach the xdp helper to the veth

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni bd8e1afe64 selftests: add dummy xdp test helper
This trivial XDP program does nothing, but will be used by the
next patch to test the GRO path in a net namespace, leveraging
the veth XDP implementation.

It's added here, despite its 'net' usage, to avoid the duplication
of the llc-related makefile boilerplate.

rfc v3 -> v1:
 - move the helper implementation into the bpf directory, don't
   touch udpgso_bench_rx

rfc v2 -> rfc v3:
 - move 'x' option handling here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni 0a9ac2e954 selftests: add GRO support to udp bench rx program
And fix a couple of buglets (port option processing,
clean termination on SIGINT). This is preparatory work
for GRO tests.

rfc v2 -> rfc v3:
 - use ETH_MAX_MTU

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Quentin Monnet 8302b9bd31 tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps
The limit for memory locked in the kernel by a process is usually set to
64 kbytes by default. This can be an issue when creating large BPF maps
and/or loading many programs. A workaround is to raise this limit for
the current process before trying to create a new BPF map. Changing the
hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
root user (for non-root users, a call to setrlimit fails (and sets
errno) and the program simply goes on with its rlimit unchanged).

There is no API to get the current amount of memory locked for a user,
therefore we cannot raise the limit only when required. One solution,
used by bcc, is to try to create the map, and on getting a EPERM error,
raising the limit to infinity before giving another try. Another
approach, used in iproute2, is to raise the limit in all cases, before
trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load programs or to create maps with bpftool.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07 22:22:21 +01:00
Quentin Monnet f96afa767b selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o.

For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
and remove the associated "TODO".

For tracex3_kern.o, instead of loading a program from samples/bpf/ that
might not have been compiled at this stage, try loading a program from
BPF selftests. Since this test case is about loading a program compiled
without the "-target bpf" flag, change the Makefile to compile one
program accordingly (instead of passing the flag for compiling all
programs).

Regarding test_xdp_noinline.o: in its current shape the program fails to
load because it provides no version section, but the loader needs one.
The test was added to make sure that libbpf could load XDP programs even
if they do not provide a version number in a dedicated section. But
libbpf is already capable of doing that: in our case loading fails
because the loader does not know that this is an XDP program (it does
not need to, since it does not attach the program). So trying to load
test_xdp_noinline.o does not bring much here: just delete this subtest.

For the record, the error message obtained with tracex3_kern.o was
fixed by commit e3d91b0ca5 ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

RFC -> v1:
- Compile test_xdp without the "-target bpf" flag, and try to load it
  instead of ../../samples/bpf/tracex3_kern.o.
- Delete test_xdp_noinline.o subtest.

Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07 22:20:56 +01:00
Ingo Molnar 45fd808091 perf/urgent improvements and fixes:
Intel PT sql viewer: (Adrian Hunter)
 
 - Fall back to /usr/local/lib/libxed.so
 - Add Selected branches report
 - Add help window
 - Fix table find when table re-ordered
 
 Intel PT debug log (Adrian Hunter)
 
 - Add more event information
 - Add MTC and CYC timestamps
 
 perf record: (Andi Kleen)
 
 - Support weak groups, just like with 'perf stat'
 
 perf trace: (Arnaldo Carvalho de Melo)
 
 - Start augmenting raw_syscalls:{sys_enter,sys_exit}: goal is to have a
   generic, arch independent eBPF kernel component that is programmed with
   syscall table details, what to copy, how many bytes, pid, arg filters from the
   userspace via eBPF maps by the 'perf trace' tool that continues to use all its
   argument beautifiers, just taking advantage of the extra pointer contents.
 
 JVMTI: (Gustavo Romero)
 
 - Fix undefined symbol scnprintf in libperf-jvmti.so
 
 perf top: (Jin Yao)
 
 - Display the LBR stats in callchain entries
 
 perf stat: (Thomas Richter)
 
 - Handle different PMU names with common prefix
 
 arm64: Will (Deacon)
 
 - Fix arm64 tools build failure wrt smp_load_{acquire,release}.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCW+GBMAAKCRCyPKLppCJ+
 J5hwAP9+7F2HKvjwHj4g6YeAvCp2WzXbO9UzakfTNtkAwWDZHwD/aN8T8RdgiaCm
 FqlDoftwvSQSpbKvaiN7M1GSk14a+AQ=
 =gWMp
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.20-20181106' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent improvements and fixes from Arnaldo Carvalho de Melo:

Intel PT SQL viewer: (Adrian Hunter)

- Fall back to /usr/local/lib/libxed.so
- Add Selected branches report
- Add help window
- Fix table find when table re-ordered

Intel PT debug log (Adrian Hunter)

- Add more event information
- Add MTC and CYC timestamps

perf record: (Andi Kleen)

- Support weak groups, just like with 'perf stat'

perf trace: (Arnaldo Carvalho de Melo)

- Start augmenting raw_syscalls:{sys_enter,sys_exit}: goal is to have a
  generic, arch independent eBPF kernel component that is programmed with
  syscall table details, what to copy, how many bytes, pid, arg filters from the
  userspace via eBPF maps by the 'perf trace' tool that continues to use all its
  argument beautifiers, just taking advantage of the extra pointer contents.

JVMTI: (Gustavo Romero)

- Fix undefined symbol scnprintf in libperf-jvmti.so

perf top: (Jin Yao)

- Display the LBR stats in callchain entries

perf stat: (Thomas Richter)

- Handle different PMU names with common prefix

arm64: Will (Deacon)

- Fix arm64 tools build failure wrt smp_load_{acquire,release}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-06 20:03:11 +01:00
Jiri Olsa dbc4ca339c tools cpupower: Override CFLAGS assignments
So user could specify outside CFLAGS values.

Cc: Thomas Renninger <trenn@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06 08:54:16 -07:00
Jiri Olsa 4bf3bd0f15 tools cpupower debug: Allow to use outside build flags
Adding CFLAGS and LDFLAGS to be used during the build.

Cc: Thomas Renninger <trenn@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06 08:54:16 -07:00
Konstantin Khlebnikov 9de9aa45e9 tools/power/cpupower: fix compilation with STATIC=true
Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Thomas Renninger <trenn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06 08:54:16 -07:00
Jiri Olsa 8e88c29b35 perf tools: Do not zero sample_id_all for group members
Andi reported following malfunction:

  # perf record -e '{ref-cycles,cycles}:S' -a sleep 1
  # perf script
  non matching sample_id_all

That's because we disable sample_id_all bit for non-sampling group
members. We can't do that, because it needs to be the same over the
whole event list. This patch keeps it untouched again.

Reported-by: Andi Kleen <andi@firstfloor.org>
Tested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180923150420.27327-1-jolsa@kernel.org
Fixes: e9add8bac6 ("perf evsel: Disable write_backward for leader sampling group events")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 08:29:56 -03:00
Masayoshi Mizuma af31b04b67 tools/testing/nvdimm: Fix the array size for dimm devices.
KASAN reports following global out of bounds access while
nfit_test is being loaded. The out of bound access happens
the following reference to dimm_fail_cmd_flags[dimm]. 'dimm' is
over than the index value, NUM_DCR (==5).

  static int override_return_code(int dimm, unsigned int func, int rc)
  {
          if ((1 << func) & dimm_fail_cmd_flags[dimm]) {

dimm_fail_cmd_flags[] definition:
  static unsigned long dimm_fail_cmd_flags[NUM_DCR];

'dimm' is the return value of get_dimm(), and get_dimm() returns
the index of handle[] array. The handle[] has 7 index. Let's use
ARRAY_SIZE(handle) as the array size.

KASAN report:

==================================================================
BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
Read of size 8 at addr ffffffffc10cbbe8 by task kworker/u41:0/8
...
Call Trace:
 dump_stack+0xea/0x1b0
 ? dump_stack_print_info.cold.0+0x1b/0x1b
 ? kmsg_dump_rewind_nolock+0xd9/0xd9
 print_address_description+0x65/0x22e
 ? nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
 kasan_report.cold.6+0x92/0x1a6
 nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
...
The buggy address belongs to the variable:
 dimm_fail_cmd_flags+0x28/0xffffffffffffa440 [nfit_test]
==================================================================

Fixes: 39611e83a2 ("tools/testing/nvdimm: Make DSM failure code injection...")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-05 11:34:08 -08:00
Gustavo Romero 6ac2226229 perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so
Currently jvmti agent can not be used because function scnprintf is not
present in the agent libperf-jvmti.so. As a result the JVM when using
such agent to record JITed code profiling information will fail on
looking up scnprintf:

  java: symbol lookup error: lib/libperf-jvmti.so: undefined symbol: scnprintf

This commit fixes that by reverting to the use of snprintf, that can be
looked up, instead of scnprintf, adding a proper check for the returned
value in order to print a better error message when the jitdump file
pathname is too long. Checking the returned value also helps to comply
with some recent gcc versions, like gcc8, which will fail due to
truncated writing checks related to the -Werror=format-truncation= flag.

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 1541117601-18937-2-git-send-email-gromero@linux.vnet.ibm.com
Link: https://lkml.kernel.org/n/tip-mvpxxxy7wnzaj74cq75muw3f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 16:28:00 -03:00
Arnaldo Carvalho de Melo e2c39f36c3 perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers
Guenter reported that using ARCH=x86_64 to build perf has regressed:

  $ make -C tools/perf O=/tmp/build/perf ARCH=x86_64
  make: Entering directory '/home/acme/git/perf/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
    HOSTCC   /tmp/build/perf/fixdep.o
    HOSTLD   /tmp/build/perf/fixdep-in.o
    LINK     /tmp/build/perf/fixdep

  Auto-detecting system features:
  ...                         dwarf: [ on  ]
  <SNIP>
  ...                           bpf: [ on  ]

    GEN      /tmp/build/perf/common-cmds.h
  make[2]: *** No rule to make target '/home/acme/git/perf/tools/arch/x86_64/include/uapi/asm//mman.h', needed by '/tmp/build/perf/trace/beauty/generated/mmap_flags_array.c'.  Stop.
  make[2]: *** Waiting for unfinished jobs....
    PERF_VERSION = 4.19.gf6c23e3
  make[1]: *** [Makefile.perf:207: sub-make] Error 2
  make: *** [Makefile:70: all] Error 2
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $

This is because we must use $(SRCARCH) where we were using $(ARCH), so
that, just like the top level Makefile, we get this done:

  # Additional ARCH settings for x86
  ifeq ($(ARCH),i386)
          SRCARCH := x86
  endif
  ifeq ($(ARCH),x86_64)
          SRCARCH := x86
  endif

Which is done in tools/scripts/Makefile.arch, so switch to use
$(SRCARCH).

Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: fbd7458db7 ("perf beauty: Wire up the mmap flags table generator to the Makefile")
Link: https://lkml.kernel.org/r/20181105184612.GD7077@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 15:46:51 -03:00
Adrian Hunter f6c23e3b55 perf intel-pt: Add MTC and CYC timestamps to debug log
One cause of decoding errors is un-synchronized side-band data.
Timestamps are needed to debug such cases. TSC packet timestamps are
logged. Log also MTC and CYC timestamps.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/20181105073505.8129-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:53:54 -03:00
Adrian Hunter 93f8be2799 perf intel-pt: Add more event information to debug log
More event information is useful for debugging, especially MMAP events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/20181105073505.8129-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:53:37 -03:00
Adrian Hunter 35fa1cee21 perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered
Table rows can be re-ordered by selecting a column to sort by. After
re-ordering, the "find" operation was highlighting the wrong row, fix
it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20181104151238.15947-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:53:00 -03:00
Adrian Hunter 65b24292e8 perf scripts python: exported-sql-viewer.py: Add help window
Add a window to display help. It is also possible to display the help
only, by using the option "--help-only" instead of a database name.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20181104151238.15947-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:52:45 -03:00
Adrian Hunter 210cf1f961 perf scripts python: exported-sql-viewer.py: Add Selected branches report
Fetching data from the database can be slow. Add a report that provides
the ability to select a subset of branches.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20181104151238.15947-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:51:55 -03:00
Adrian Hunter 5ed4419d47 perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so
Fall back to /usr/local/lib/libxed.so to cater for distributions that do
not have /usr/local/lib in the library path by default.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20181104151238.15947-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:51:31 -03:00
Jin Yao 590ac60d8a perf top: Display the LBR stats in callchain entry
'perf report' has supported the displaying of LBR stats (such as cycles,
predicted%) in callchain entry.

For example:

  $ perf report --branch-history --stdio

  --1.01%--intel_idle mwait.h:29
            intel_idle cpufeature.h:164 (cycles:5)
            intel_idle cpufeature.h:164 (predicted:76.4%)
            intel_idle mwait.h:102 (cycles:41)
            intel_idle current.h:15

While 'perf top' doesn't support that.

For example:

  $ perf top -a -b --call-graph branch

  -   13.86%     0.23%  [kernel]		[k] __x86_indirect_thunk_rax
     - 13.65% __x86_indirect_thunk_rax
        + 1.69% do_syscall_64
        + 1.68% do_select
        + 1.41% ktime_get
        + 0.70% __schedule
        + 0.62% do_sys_poll
          0.58% __x86_indirect_thunk_rax

Actually it's very easy to enable this feature in 'perf top'.

With this patch, the result is:

  $ perf top -a -b --call-graph branch

  $ -   13.58%     0.00%  [kernel]		[k] __x86_indirect_thunk_rax
     $ - 13.57% __x86_indirect_thunk_rax (predicted:93.9%)
        $ + 1.78% do_select (cycles:2)
        $ + 1.68% perf_pmu_disable.part.99 (cycles:1)
        $ + 1.45% ___sys_recvmsg (cycles:25)
        $ + 0.81% unix_stream_sendmsg (cycles:18)
        $ + 0.80% ktime_get (cycles:400)
          $ 0.58% pick_next_task_fair (cycles:47)
        $ + 0.56% i915_request_retire (cycles:2)
        $ + 0.52% do_sys_poll (cycles:4)

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1540983995-20462-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:37:11 -03:00
Thomas Richter ea1fa48c05 perf stat: Handle different PMU names with common prefix
On s390 the CPU Measurement Facility for counters now supports
2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
for one and the same CPU.

Running command

 [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
	 -- ~/mytests/cf-tx-events 1

 Measuring transactions
 TX_C_TABORT_NO_SPECIAL: 0 expected:0
 TX_C_TABORT_SPECIAL: 0 expected:0
 TX_C_TEND: 1 expected:1
 TX_NC_TABORT: 11 expected:11
 TX_NC_TEND: 1 expected:1

 Performance counter stats for '/root/mytests/cf-tx-events 1':

  2      tx_c_tend

      0.002120091 seconds time elapsed

      0.000121000 seconds user
      0.002127000 seconds sys

 [root@s35lp76 perf]#

displays output which is unexpected (and wrong):

  2      tx_c_tend

The test program definitely triggers only one transaction, as shown
in line 'TX_C_TEND: 1 expected:1'.

This is caused by the following call sequence:

pmu_lookup() scans and installs a PMU.
+--> pmu_aliases() parses all aliases in directory
		.../<pmu-name>/events/* which are file names.
     +--> pmu_aliases_parse() Read each file in directory and create
                      an new alias entry. This is done with
          +--> perf_pmu__new_alias() and
	       +--> __perf_pmu__new_alias() which also check for
	                   identical alias names.

After pmu_aliases() returns, a complete list of event names
for this pmu has been created. Now function

pmu_add_cpu_aliases()   is called to add the events listed in the json
|                       files to the alias list of the cpu.
+--> perf_pmu__find_map()  Returns a pointer to the json events.

Now function pmu_add_cpu_aliases() scans through all events listed
in the JSON files for this CPU.
Each json event pmu name is compared with the current PMU being
built up and if they mismatch, the json event is added to the
current PMUs alias list.
To avoid duplicate entries the following comparison is done:

	if (!is_arm_pmu_core(name)) {
	     pname = pe->pmu ? pe->pmu : "cpu";
	     if (strncmp(pname, name, strlen(pname)))
		     continue;
     }

The culprit is the strncmp() function.

Using current s390 PMU naming, the first PMU is 'cpum_cf'
and a long list of events is added, among them 'tx_c_tend'

When the second PMU named 'cpum_cf_diag' is added, only one event
named 'CF_DIAG' is added by the pmu_aliases()  function.

Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
Since the CPUID string is the same for both PMUs, json file events
for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'

This happens because the strncmp() actually compares:

     strncmp("cpum_cf", "cpum_cf_diag", 6);

The first parameter is the pmu name taken from the event in
the json file. The second parameter is the pmu name of the PMU
currently being built.
They are different, but the length of the compare only tests the
common prefix and this returns 0(true) when it should return false.

Now all events for PMU cpum_cf are added to the alias list for pmu
cpum_cf_diag.

Later on in function parse_events_add_pmu() the event 'tx_c_end' is
searched in all available PMUs and found twice, adding it two
times to the evsel_list global variable which is the root
of all events. This results in a counter value of 2 instead
of 1.

Output with this patch:

 [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
			-- ~/mytests/cf-tx-events 1
 Measuring transactions
 TX_C_TABORT_NO_SPECIAL: 0 expected:0
 TX_C_TABORT_SPECIAL: 0 expected:0
 TX_C_TEND: 1 expected:1
 TX_NC_TABORT: 11 expected:11
 TX_NC_TEND: 1 expected:1

 Performance counter stats for '/root/mytests/cf-tx-events 1':

                  1      tx_c_tend

      0.001815365 seconds time elapsed

      0.000123000 seconds user
      0.001756000 seconds sys

 [root@s35lp76 perf]#

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Sebastien Boisvert <sboisvert@gydle.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 292c34c102 ("perf pmu: Fix core PMU alias list for X86 platform")
Link: http://lkml.kernel.org/r/20181023151616.78193-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:37:10 -03:00
Andi Kleen cf99ad1424 perf record: Support weak groups
Implement a weak group fallback for 'perf record', similar to the
existing 'perf stat' support.  This allows to use groups that might be
longer than the available counters without failing.

Before:

  $ perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
  /bin/dmesg | grep -i perf may provide additional information.

After:

  $ ./perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
  WARNING: No sample_id_all support, falling back to unordered processing
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:37:10 -03:00
Andi Kleen c3537fc251 perf evlist: Move perf_evsel__reset_weak_group into evlist
- Move the function from builtin-stat to evlist for reuse
- Rename to evlist to match purpose better
- Pass the evlist as first argument.
- No functional changes

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181001195927.14211-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 14:37:09 -03:00
Arnaldo Carvalho de Melo 79ef68c7e1 perf augmented_syscalls: Start collecting pathnames in the BPF program
This is the start of having the raw_syscalls:sys_enter BPF handler
collecting pointer arguments, namely pathnames, and with two syscalls
that have that pointer in different arguments, "open" as it as its first
argument, "openat" as the second.

With this in place the existing beautifiers in 'perf trace' works, those
args are shown instead of just the pointer that comes with the syscalls
tracepoints.

This also serves to show and document pitfalls in the process of using
just that place in the kernel (raw_syscalls:sys_enter) plus tables
provided by userspace to collect syscall pointer arguments.

One is the need to use a barrier, as suggested by Edward, to avoid clang
optimizations that make the kernel BPF verifier to refuse loading our
pointer contents collector.

The end result should be a generic eBPF program that works in all
architectures, with the differences amongst archs resolved by the
userspace component, 'perf trace', that should get all its tables
created automatically from the kernel components where they are defined,
via string table constructors for things not expressed in BTF/DWARF
(enums, structs, etc), and otherwise using those observability files
(BTF).

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-37dz54pmotgpnwg9tb6zuk9j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05 12:41:10 -03:00
Linus Torvalds 601a88077c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A number of fixes and some late updates:

   - make in_compat_syscall() behavior on x86-32 similar to other
     platforms, this touches a number of generic files but is not
     intended to impact non-x86 platforms.

   - objtool fixes

   - PAT preemption fix

   - paravirt fixes/cleanups

   - cpufeatures updates for new instructions

   - earlyprintk quirk

   - make microcode version in sysfs world-readable (it is already
     world-readable in procfs)

   - minor cleanups and fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compat: Cleanup in_compat_syscall() callers
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  objtool: Support GCC 9 cold subfunction naming scheme
  x86/numa_emulation: Fix uniform-split numa emulation
  x86/paravirt: Remove unused _paravirt_ident_32
  x86/mm/pat: Disable preemption around __flush_tlb_all()
  x86/paravirt: Remove GPL from pv_ops export
  x86/traps: Use format string with panic() call
  x86: Clean up 'sizeof x' => 'sizeof(x)'
  x86/cpufeatures: Enumerate MOVDIR64B instruction
  x86/cpufeatures: Enumerate MOVDIRI instruction
  x86/earlyprintk: Add a force option for pciserial device
  objtool: Support per-function rodata sections
  x86/microcode: Make revision and processor flags world-readable
2018-11-03 18:25:17 -07:00
Linus Torvalds 01897f3e05 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates and fixes from Ingo Molnar:
 "These are almost all tooling updates: 'perf top', 'perf trace' and
  'perf script' fixes and updates, an UAPI header sync with the merge
  window versions, license marker updates, much improved Sparc support
  from David Miller, and a number of fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
  perf intel-pt/bts: Calculate cpumode for synthesized samples
  perf intel-pt: Insert callchain context into synthesized callchains
  perf tools: Don't clone maps from parent when synthesizing forks
  perf top: Start display thread earlier
  tools headers uapi: Update linux/if_link.h header copy
  tools headers uapi: Update linux/netlink.h header copy
  tools headers: Sync the various kvm.h header copies
  tools include uapi: Update linux/mmap.h copy
  perf trace beauty: Use the mmap flags table generated from headers
  perf beauty: Wire up the mmap flags table generator to the Makefile
  perf beauty: Add a generator for MAP_ mmap's flag constants
  tools include uapi: Update asound.h copy
  tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies
  tools include uapi: Update linux/fs.h copy
  perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
  perf cs-etm: Correct CPU mode for samples
  perf unwind: Take pgoff into account when reporting elf to libdwfl
  perf top: Do not use overwrite mode by default
  perf top: Allow disabling the overwrite mode
  perf trace: Beautify mount's first pathname arg
  ...
2018-11-03 18:13:43 -07:00
Ingo Molnar 23a12ddee1 Merge branch 'core/urgent' into x86/urgent, to pick up objtool fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03 23:42:16 +01:00
Arnaldo Carvalho de Melo cd26ea6d50 perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls
For now with BPF raw_augmented we hook into raw_syscalls:sys_enter and
there we get all 6 syscall args plus the tracepoint common fields
(sizeof(long)) and the syscall_nr (another long). So we check if that is
the case and if so don't look after the sc->args_size, but always after
the full raw_syscalls:sys_enter payload, which is fixed.

We'll revisit this later to pass s->args_size to the BPF augmenter (now
tools/perf/examples/bpf/augmented_raw_syscalls.c, so that it copies only
what we need for each syscall, like what happens when we use
syscalls:sys_enter_NAME, so that we reduce the kernel/userspace traffic
to just what is needed for each syscall.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-nlslrg8apxdsobt4pwl3n7ur@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-03 08:19:56 -03:00
Linus Torvalds b69f9e17a5 powerpc fixes for 4.20 #2
Some things that I missed due to travel, or that came in late.
 
 Two fixes also going to stable:
 
  - A revert of a buggy change to the 8xx TLB miss handlers.
 
  - Our flushing of SPE (Signal Processing Engine) registers on fork was broken.
 
 Other changes:
 
  - A change to the KVM decrementer emulation to use proper APIs.
 
  - Some cleanups to the way we do code patching in the 8xx code.
 
  - Expose the maximum possible memory for the system in /proc/powerpc/lparcfg.
 
  - Merge some updates from Scott: "a couple device tree updates, and a fix for a
    missing prototype warning."
 
 A few other minor fixes and a handful of fixes for our selftests.
 
 Thanks to:
   Aravinda Prasad, Breno Leitao, Camelia Groza, Christophe Leroy, Felipe Rechia,
   Joel Stanley, Naveen N. Rao, Paul Mackerras, Scott Wood, Tyrel Datwyler.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb3C+uAAoJEFHr6jzI4aWAPJgQAIX0aD/PiYfEUI/rm/Q0vnJI
 HO3FCKroi+LVF/URU24+NLA/1KGCBfO9by9m6D/nnmHl+vi2P69fFgokywO0Ajru
 nf9a+9Gx53IbO7EEUf1fZVwxCMobBqU8eWq1hIBndd5HTz9QEftc/RXgpgqZcQ/x
 x8xN1FNMSUT9NwMk750QDO7CFBrSfSjFC+/WrkViBaMiRWx2rwle+2tDipQ/fegY
 Tsu4wg0qEzWMT//MFP4yOlkcLV8M6d2Sw65Km59rWHA1I2wqsTek1yQ68Epo9Co0
 RdJh9Nt1kjLC5XXteneFhe18UUPKRmrXbYDFByw5CUhs5VI99Dq4w5kamh197XLr
 +jA3XHAeAyaXf21I9zmmZXbhHanowCPZGyzZqZXWJ86bVJp5v328wXmnxtKrb0Nz
 pH7fjQ6zjzsZgIcN9i2CFpIvuDQ/z1A/QyHdBnRvJ8HoXlTerZCn22JTgY7d2VJu
 XJn1n+VABG2BrzJexW/7quY3Z7V6tvdkloWwOA3PdAwkcoImd4BfYq9K2DU//diN
 BnXPnDs2K7JwDG9s+cgUEHnrP6DOKsxT+mmYpqXf0Ta0wtyoZJ1zgSdfZAUOmjnb
 MhcK46l+F4E891qjnsuuVNnspqI4yPMLmAGmife5OUrfcoFdm/vFbM75FaXameBx
 cMOmidrJJhO6z5eWSKvO
 =VCkW
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some things that I missed due to travel, or that came in late.

  Two fixes also going to stable:

   - A revert of a buggy change to the 8xx TLB miss handlers.

   - Our flushing of SPE (Signal Processing Engine) registers on fork
     was broken.

  Other changes:

   - A change to the KVM decrementer emulation to use proper APIs.

   - Some cleanups to the way we do code patching in the 8xx code.

   - Expose the maximum possible memory for the system in
     /proc/powerpc/lparcfg.

   - Merge some updates from Scott: "a couple device tree updates, and a
     fix for a missing prototype warning"

  A few other minor fixes and a handful of fixes for our selftests.

  Thanks to: Aravinda Prasad, Breno Leitao, Camelia Groza, Christophe
  Leroy, Felipe Rechia, Joel Stanley, Naveen N. Rao, Paul Mackerras,
  Scott Wood, Tyrel Datwyler"

* tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
  selftests/powerpc: Fix compilation issue due to asm label
  selftests/powerpc/cache_shape: Fix out-of-tree build
  selftests/powerpc/switch_endian: Fix out-of-tree build
  selftests/powerpc/pmu: Link ebb tests with -no-pie
  selftests/powerpc/signal: Fix out-of-tree build
  selftests/powerpc/ptrace: Fix out-of-tree build
  powerpc/xmon: Relax frame size for clang
  selftests: powerpc: Fix warning for security subdir
  selftests/powerpc: Relax L1d miss targets for rfi_flush test
  powerpc/process: Fix flush_all_to_thread for SPE
  powerpc/pseries: add missing cpumask.h include file
  selftests/powerpc: Fix ptrace tm failure
  KVM: PPC: Use exported tb_to_ns() function in decrementer emulation
  powerpc/pseries: Export maximum memory value
  powerpc/8xx: Use patch_site for perf counters setup
  powerpc/8xx: Use patch_site for memory setup patching
  powerpc/code-patching: Add a helper to get the address of a patch_site
  Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
  powerpc/8xx: add missing header in 8xx_mmu.c
  powerpc/8xx: Add DT node for using the SEC engine of the MPC885
  ...
2018-11-02 09:19:35 -07:00
Arnaldo Carvalho de Melo 3c5e3dabf3 perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too
With just this commit we get to support all syscalls via hooking
raw_syscalls:sys_{enter,exit} to the trace__sys_{enter,exit} routines
to combine, strace-like, those tracepoints.

  # trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
         ? (         ): sleep/31680  ... [continued]: execve()) = 0
     0.043 ( 0.004 ms): sleep/31680 brk() = 0x55652a851000
     0.070 ( 0.009 ms): sleep/31680 access(filename:, mode: R) = -1 ENOENT No such file or directory
     0.087 ( 0.006 ms): sleep/31680 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
     0.096 ( 0.003 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7ffc5269e190) = 0
     0.101 ( 0.005 ms): sleep/31680 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) = 0x7f709c239000
     0.109 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
     0.126 ( 0.006 ms): sleep/31680 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
     0.135 ( 0.003 ms): sleep/31680 read(fd: 3, buf: 0x7ffc5269e358, count: 832) = 832
     0.141 ( 0.002 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7ffc5269e1f0) = 0
     0.146 ( 0.005 ms): sleep/31680 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f709c237000
     0.159 ( 0.007 ms): sleep/31680 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7f709bc79000
     0.168 ( 0.009 ms): sleep/31680 mprotect(start: 0x7f709be26000, len: 2093056) = 0
     0.179 ( 0.010 ms): sleep/31680 mmap(addr: 0x7f709c025000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7f709c025000
     0.196 ( 0.005 ms): sleep/31680 mmap(addr: 0x7f709c02b000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f709c02b000
     0.210 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
     0.230 ( 0.002 ms): sleep/31680 arch_prctl(option: 4098, arg2: 140121632638208) = 0
     0.306 ( 0.009 ms): sleep/31680 mprotect(start: 0x7f709c025000, len: 16384, prot: READ) = 0
     0.338 ( 0.005 ms): sleep/31680 mprotect(start: 0x556529607000, len: 4096, prot: READ) = 0
     0.348 ( 0.005 ms): sleep/31680 mprotect(start: 0x7f709c253000, len: 4096, prot: READ) = 0
     0.356 ( 0.019 ms): sleep/31680 munmap(addr: 0x7f709c239000, len: 103334) = 0
     0.463 ( 0.002 ms): sleep/31680 brk() = 0x55652a851000
     0.468 ( 0.004 ms): sleep/31680 brk(brk: 0x55652a872000) = 0x55652a872000
     0.474 ( 0.002 ms): sleep/31680 brk() = 0x55652a872000
     0.484 ( 0.008 ms): sleep/31680 open(filename: , flags: CLOEXEC) = 3
     0.497 ( 0.002 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7f709c02aaa0) = 0
     0.501 ( 0.006 ms): sleep/31680 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f70950aa000
     0.514 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
     0.554 (1000.140 ms): sleep/31680 nanosleep(rqtp: 0x7ffc5269eed0) = 0
  1000.734 ( 0.007 ms): sleep/31680 close(fd: 1) = 0
  1000.748 ( 0.004 ms): sleep/31680 close(fd: 2) = 0
  1000.769 (         ): sleep/31680 exit_group()
  #

Now to allow selecting which syscalls should be traced, using a map.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-votqqmqhag8e1i9mgyzfez3o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-01 14:11:52 -03:00
Arnaldo Carvalho de Melo febf8a3712 perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit}
The previous approach of attaching to each syscall showed how it is
possible to augment tracepoints and use that augmentation, pointer
payloads, in the existing beautifiers in 'perf trace', but for a more
general solution we now will try to augment the main
raw_syscalls:sys_{enter,exit} syscalls, and then pass instructions in
maps so that it knows which syscalls and which pointer contents, and how
many bytes for each of the arguments should be copied.

Start with just the bare minimum to collect what is provided by those
two tracepoints via the __augmented_syscalls__ map + bpf-output perf
event, which results in perf trace showing them without connecting
enter+exit:

  # perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
     0.000 sleep/11563 raw_syscalls:sys_exit:NR 59 = 0
     0.019 (         ): sleep/11563 brk() ...
     0.021 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642325504
     0.033 (         ): sleep/11563 access(filename:, mode: R) ...
     0.037 sleep/11563 raw_syscalls:sys_exit:NR 21 = -2
     0.041 (         ): sleep/11563 openat(dfd: CWD, filename: , flags: CLOEXEC) ...
     0.044 sleep/11563 raw_syscalls:sys_exit:NR 257 = 3
     0.045 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7ffdbf7119b0) ...
     0.046 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
     0.047 (         ): sleep/11563 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) ...
     0.049 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196285493248
     0.050 (         ): sleep/11563 close(fd: 3) ...
     0.051 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
     0.059 (         ): sleep/11563 openat(dfd: CWD, filename: , flags: CLOEXEC) ...
     0.062 sleep/11563 raw_syscalls:sys_exit:NR 257 = 3
     0.063 (         ): sleep/11563 read(fd: 3, buf: 0x7ffdbf711b78, count: 832) ...
     0.065 sleep/11563 raw_syscalls:sys_exit:NR 0 = 832
     0.066 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7ffdbf711a10) ...
     0.067 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
     0.068 (         ): sleep/11563 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) ...
     0.070 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196285485056
     0.073 (         ): sleep/11563 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) ...
     0.076 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196279463936
     0.077 (         ): sleep/11563 mprotect(start: 0x7f81fd8a8000, len: 2093056) ...
     0.083 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
     0.084 (         ): sleep/11563 mmap(addr: 0x7f81fdaa7000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) ...
     0.088 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196283314176
     0.091 (         ): sleep/11563 mmap(addr: 0x7f81fdaad000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) ...
     0.093 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196283338752
     0.097 (         ): sleep/11563 close(fd: 3) ...
     0.098 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
     0.107 (         ): sleep/11563 arch_prctl(option: 4098, arg2: 140196285490432) ...
     0.108 sleep/11563 raw_syscalls:sys_exit:NR 158 = 0
     0.143 (         ): sleep/11563 mprotect(start: 0x7f81fdaa7000, len: 16384, prot: READ) ...
     0.146 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
     0.157 (         ): sleep/11563 mprotect(start: 0x561d037e7000, len: 4096, prot: READ) ...
     0.160 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
     0.163 (         ): sleep/11563 mprotect(start: 0x7f81fdcd5000, len: 4096, prot: READ) ...
     0.165 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
     0.166 (         ): sleep/11563 munmap(addr: 0x7f81fdcbb000, len: 103334) ...
     0.174 sleep/11563 raw_syscalls:sys_exit:NR 11 = 0
     0.216 (         ): sleep/11563 brk() ...
     0.217 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642325504
     0.217 (         ): sleep/11563 brk(brk: 0x561d05453000) ...
     0.219 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642460672
     0.220 (         ): sleep/11563 brk() ...
     0.221 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642460672
     0.224 (         ): sleep/11563 open(filename: , flags: CLOEXEC) ...
     0.228 sleep/11563 raw_syscalls:sys_exit:NR 2 = 3
     0.229 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7f81fdaacaa0) ...
     0.230 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
     0.231 (         ): sleep/11563 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) ...
     0.234 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196166418432
     0.237 (         ): sleep/11563 close(fd: 3) ...
     0.238 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
     0.262 (         ): sleep/11563 nanosleep(rqtp: 0x7ffdbf7126f0) ...
  1000.399 sleep/11563 raw_syscalls:sys_exit:NR 35 = 0
  1000.440 (         ): sleep/11563 close(fd: 1) ...
  1000.447 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
  1000.454 (         ): sleep/11563 close(fd: 2) ...
  1000.468 (         ): sleep/11563 exit_group(                                                           )
  #

In the next csets we'll connect those events to the existing enter/exit
raw_syscalls handlers in 'perf trace', just like we did with the
syscalls:sys_{enter,exit}_* tracepoints.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5nl8l4hx1tl9pqdx65nkp6pw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-01 14:11:45 -03:00
Linus Torvalds 82aa467151 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) BPF verifier fixes from Daniel Borkmann.

 2) HNS driver fixes from Huazhong Tan.

 3) FDB only works for ethernet devices, reject attempts to install FDB
    rules for others. From Ido Schimmel.

 4) Fix spectre V1 in vhost, from Jason Wang.

 5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2
    driver, from Marc Zyngier.

 6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
  openvswitch: Fix push/pop ethernet validation
  net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
  bpf: test make sure to run unpriv test cases in test_verifier
  bpf: add various test cases to test_verifier
  bpf: don't set id on after map lookup with ptr_to_map_val return
  bpf: fix partial copy of map_ptr when dst is scalar
  libbpf: Fix compile error in libbpf_attach_type_by_name
  kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
  selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
  selftests: mlxsw: qos_mc_aware: Tweak for min shaper
  mlxsw: spectrum: Set minimum shaper on MC TCs
  mlxsw: reg: QEEC: Add minimum shaper fields
  net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
  net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
  net: hns3: bugfix for handling mailbox while the command queue reinitialized
  net: hns3: fix incorrect return value/type of some functions
  net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
  net: hns3: bugfix for is_valid_csq_clean_head()
  net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
  net: hns3: bugfix for the initialization of command queue's spin lock
  ...
2018-11-01 09:16:01 -07:00
Will Deacon 51f5fd2e46 tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release}
Cheers for reporting this. I managed to reproduce the build failure with
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1).

The code in question is the arm64 versions of smp_load_acquire() and
smp_store_release(). Unlike other architectures, these are not built
around READ_ONCE() and WRITE_ONCE() since we have instructions we can
use instead of fences. Bringing our macros up-to-date with those (i.e.
tweaking the union initialisation and using the special "uXX_alias_t"
types) appears to fix the issue for me.

Committer notes:

Testing it in the systems previously failing:

  # time dm android-ndk:r12b-arm \
         android-ndk:r15c-arm \
         debian:experimental-x-arm64 \
         ubuntu:14.04.4-x-linaro-arm64 \
         ubuntu:16.04-x-arm \
         ubuntu:16.04-x-arm64 \
         ubuntu:18.04-x-arm \
         ubuntu:18.04-x-arm64
    1 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
    2 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
    3 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 8.2.0-7) 8.2.0
    4 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
    5 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
    6 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
    7 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0
    8 ubuntu:18.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181031174408.GA27871@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-01 10:07:43 -03:00
Josh Poimboeuf bcb6fb5da7 objtool: Support GCC 9 cold subfunction naming scheme
Starting with GCC 8, a lot of unlikely code was moved out of line to
"cold" subfunctions in .text.unlikely.

For example, the unlikely bits of:

  irq_do_set_affinity()

are moved out to the following subfunction:

  irq_do_set_affinity.cold.49()

Starting with GCC 9, the numbered suffix has been removed.  So in the
above example, the cold subfunction is instead:

  irq_do_set_affinity.cold()

Tweak the objtool subfunction detection logic so that it detects both
GCC 8 and GCC 9 naming schemes.

Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/015e9544b1f188d36a7f02fa31e9e95629aa5f50.1541040800.git.jpoimboe@redhat.com
2018-11-01 09:55:38 +01:00
David S. Miller df975da4e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-11-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix tcp_bpf_recvmsg() to return -EAGAIN instead of 0 in non-blocking
   case when no data is available yet, from John.

2) Fix a compilation error in libbpf_attach_type_by_name() when compiled
   with clang 3.8, from Andrey.

3) Fix a partial copy of map pointer on scalar alu and remove id
   generation for RET_PTR_TO_MAP_VALUE return types, from Daniel.

4) Add unlimited memlock limit for kernel selftest's flow_dissector_load
   program, from Yonghong.

5) Fix ping for some BPF shell based kselftests where distro does not
   ship "ping -6" anymore, from Li.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 17:34:08 -07:00
Daniel Borkmann 832c6f2c29 bpf: test make sure to run unpriv test cases in test_verifier
Right now unprivileged tests are never executed as a BPF test run,
only loaded. Allow for running them as well so that we can check
the outcome and probe for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Daniel Borkmann 2683f4128c bpf: add various test cases to test_verifier
Add some more map related test cases to test_verifier kselftest
to improve test coverage. Summary: 1012 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Naveen N. Rao 1936f094e1 selftests/powerpc: Fix compilation issue due to asm label
We are using 'dscr_insn' as a label in inline asm to identify if a
SIGILL was generated by the mtspr instruction at that point. However,
with inline assembly, the compiler is still free to duplicate the asm
statement for optimization purposes, which results in the label being
defined twice with the error:
	/tmp/ccerQCql.s:874: Error: symbol `dscr_insn' is already defined

With different compiler versions, we may also see:
	/tmp/ccJzLDlN.o:(.toc+0x0): undefined reference to `dscr_insn'

Remove the use of the label in the inline assembly. Instead, just look
for the offending instruction in the signal handler.

Fixes: d2bf793237 ("selftests/powerpc: Add test to verify rfi flush across a system call")
Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-01 10:06:03 +11:00
Andrey Ignatov 3615353218 libbpf: Fix compile error in libbpf_attach_type_by_name
Arnaldo Carvalho de Melo reported build error in libbpf when clang
version 3.8.1-24 (tags/RELEASE_381/final) is used:

libbpf.c:2201:36: error: comparison of constant -22 with expression of
type 'const enum bpf_attach_type' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
                if (section_names[i].attach_type == -EINVAL)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~
1 error generated.

Fix the error by keeping "is_attachable" property of a program in a
separate struct field instead of trying to use attach_type itself.

Fixes: 956b620fcf ("libbpf: Introduce libbpf_attach_type_by_name")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-31 23:06:17 +01:00
Li Zhijian deee2cae27 kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
ping binary on some distros doesn't support "ping -6" anymore.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-31 23:05:30 +01:00