Commit Graph

783637 Commits

Author SHA1 Message Date
Roman Gushchin 25025e0aab bpf: sync include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.h
The sync is required due to the appearance of a new map type:
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, which implements per-cpu
cgroup local storage.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:33 +02:00
Roman Gushchin c6fdcd6e0c bpf: don't allow create maps of per-cpu cgroup local storages
Explicitly forbid creating map of per-cpu cgroup local storages.
This behavior matches the behavior of shared cgroup storages.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:33 +02:00
Roman Gushchin b741f16303 bpf: introduce per-cpu cgroup local storage
This commit introduced per-cpu cgroup local storage.

Per-cpu cgroup local storage is very similar to simple cgroup storage
(let's call it shared), except all the data is per-cpu.

The main goal of per-cpu variant is to implement super fast
counters (e.g. packet counters), which don't require neither
lookups, neither atomic operations.

>From userspace's point of view, accessing a per-cpu cgroup storage
is similar to other per-cpu map types (e.g. per-cpu hashmaps and
arrays).

Writing to a per-cpu cgroup storage is not atomic, but is performed
by copying longs, so some minimal atomicity is here, exactly
as with other per-cpu maps.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Roman Gushchin f294b37ec7 bpf: rework cgroup storage pointer passing
To simplify the following introduction of per-cpu cgroup storage,
let's rework a bit a mechanism of passing a pointer to a cgroup
storage into the bpf_get_local_storage(). Let's save a pointer
to the corresponding bpf_cgroup_storage structure, instead of
a pointer to the actual buffer.

It will help us to handle per-cpu storage later, which has
a different way of accessing to the actual data.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Roman Gushchin 8bad74f984 bpf: extend cgroup bpf core to allow multiple cgroup storage types
In order to introduce per-cpu cgroup storage, let's generalize
bpf cgroup core to support multiple cgroup storage types.
Potentially, per-node cgroup storage can be added later.

This commit is mostly a formal change that replaces
cgroup_storage pointer with a array of cgroup_storage pointers.
It doesn't actually introduce a new storage type,
it will be done later.

Each bpf program is now able to have one cgroup storage of each type.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Yonghong Song 5bf7a60b8e bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()
Currently, helper bpf_get_current_cgroup_id() is not permitted
for CGROUP_DEVICE type of programs. If the helper is used
in such cases, the verifier will log the following error:

  0: (bf) r6 = r1
  1: (69) r7 = *(u16 *)(r6 +0)
  2: (85) call bpf_get_current_cgroup_id#80
  unknown func bpf_get_current_cgroup_id#80

The bpf_get_current_cgroup_id() is useful for CGROUP_DEVICE
type of programs in order to customize action based on cgroup id.
This patch added such a support.

Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-28 14:15:19 +02:00
Daniel Borkmann 78e6e5c11a Merge branch 'bpf-libbpf-attach-by-name'
Andrey Ignatov says:

====================
This patch set introduces libbpf_attach_type_by_name function in libbpf
to identify attach type by section name.

This is useful to avoid writing same logic over and over again in user
space applications that leverage libbpf.

Patch 1 has more details on the new function and problem being solved.
Patches 2 and 3 add support for new section names.
Patch 4 uses new function in a selftest.
Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.

As a side note there are a lot of inconsistencies now between names used
by libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and
device vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not
address it but it tries not to make it harder to address it in the future.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:19:34 +02:00
Andrey Ignatov 370920c47b selftests/bpf: Test libbpf_{prog,attach}_type_by_name
Add selftest for libbpf functions libbpf_prog_type_by_name and
libbpf_attach_type_by_name.

Example of output:
  % ./tools/testing/selftests/bpf/test_section_names
  Summary: 35 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:14:59 +02:00
Andrey Ignatov c9bf507d0a selftests/bpf: Use libbpf_attach_type_by_name in test_socket_cookie
Use newly introduced libbpf_attach_type_by_name in test_socket_cookie
selftest.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:14:59 +02:00
Andrey Ignatov c6f6851b28 libbpf: Support sk_skb/stream_{parser, verdict} section names
Add section names for BPF_SK_SKB_STREAM_PARSER and
BPF_SK_SKB_STREAM_VERDICT attach types to be able to identify them in
libbpf_attach_type_by_name.

"stream_parser" and "stream_verdict" are used instead of simple "parser"
and "verdict" just to avoid possible confusion in a place where attach
type is used alone (e.g. in bpftool's show sub-commands) since there is
another attach point that can be named as "verdict": BPF_SK_MSG_VERDICT.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:14:59 +02:00
Andrey Ignatov bafa7afe63 libbpf: Support cgroup_skb/{e,in}gress section names
Add section names for BPF_CGROUP_INET_INGRESS and BPF_CGROUP_INET_EGRESS
attach types to be able to identify them in libbpf_attach_type_by_name.

"cgroup_skb" is used instead of "cgroup/skb" mostly to easy possible
unifying of how libbpf and bpftool works with section names:
* bpftool uses "cgroup_skb" to in "prog list" sub-command;
* bpftool uses "ingress" and "egress" in "cgroup list" sub-command;
* having two parts instead of three in a string like "cgroup_skb/ingress"
  can be leveraged to split it to prog_type part and attach_type part,
  or vise versa: use two parts to make a section name.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:14:59 +02:00
Andrey Ignatov 956b620fcf libbpf: Introduce libbpf_attach_type_by_name
There is a common use-case when ELF object contains multiple BPF
programs and every program has its own section name. If it's cgroup-bpf
then programs have to be 1) loaded and 2) attached to a cgroup.

It's convenient to have information necessary to load BPF program
together with program itself. This is where section name works fine in
conjunction with libbpf_prog_type_by_name that identifies prog_type and
expected_attach_type and these can be used with BPF_PROG_LOAD.

But there is currently no way to identify attach_type by section name
and it leads to messy code in user space that reinvents guessing logic
every time it has to identify attach type to use with BPF_PROG_ATTACH.

The patch introduces libbpf_attach_type_by_name that guesses attach type
by section name if a program can be attached.

The difference between expected_attach_type provided by
libbpf_prog_type_by_name and attach_type provided by
libbpf_attach_type_by_name is the former is used at BPF_PROG_LOAD time
and can be zero if a program of prog_type X has only one corresponding
attach type Y whether the latter provides specific attach type to use
with BPF_PROG_ATTACH.

No new section names were added to section_names array. Only existing
ones were reorganized and attach_type was added where appropriate.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:14:59 +02:00
Song Liu 100811936f bpf: test_bpf: add init_net to dev for flow_dissector
Latest changes in __skb_flow_dissect() assume skb->dev has valid nd_net.
However, this is not true for test_bpf. As a result, test_bpf.ko crashes
the system with the following stack trace:

[ 1133.716622] BUG: unable to handle kernel paging request at 0000000000001030
[ 1133.716623] PGD 8000001fbf7ee067
[ 1133.716624] P4D 8000001fbf7ee067
[ 1133.716624] PUD 1f6c1cf067
[ 1133.716625] PMD 0
[ 1133.716628] Oops: 0000 [#1] SMP PTI
[ 1133.716630] CPU: 7 PID: 40473 Comm: modprobe Kdump: loaded Not tainted 4.19.0-rc5-00805-gca11cc92ccd2 #1167
[ 1133.716631] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM12.5 12/06/2017
[ 1133.716638] RIP: 0010:__skb_flow_dissect+0x83/0x1680
[ 1133.716639] Code: 04 00 00 41 0f b7 44 24 04 48 85 db 4d 8d 14 07 0f 84 01 02 00 00 48 8b 43 10 48 85 c0 0f 84 e5 01 00 00 48 8b 80 a8 04 00 00 <48> 8b 90 30 10 00 00 48 85 d2 0f 84 dd 01 00 00 31 c0 b9 05 00 00
[ 1133.716640] RSP: 0018:ffffc900303c7a80 EFLAGS: 00010282
[ 1133.716642] RAX: 0000000000000000 RBX: ffff881fea0b7400 RCX: 0000000000000000
[ 1133.716643] RDX: ffffc900303c7bb4 RSI: ffffffff8235c3e0 RDI: ffff881fea0b7400
[ 1133.716643] RBP: ffffc900303c7b80 R08: 0000000000000000 R09: 000000000000000e
[ 1133.716644] R10: ffffc900303c7bb4 R11: ffff881fb6840400 R12: ffffffff8235c3e0
[ 1133.716645] R13: 0000000000000008 R14: 000000000000001e R15: ffffc900303c7bb4
[ 1133.716646] FS:  00007f54e75d3740(0000) GS:ffff881fff5c0000(0000) knlGS:0000000000000000
[ 1133.716648] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1133.716649] CR2: 0000000000001030 CR3: 0000001f6c226005 CR4: 00000000003606e0
[ 1133.716649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1133.716650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1133.716651] Call Trace:
[ 1133.716660]  ? sched_clock_cpu+0xc/0xa0
[ 1133.716662]  ? sched_clock_cpu+0xc/0xa0
[ 1133.716665]  ? log_store+0x1b5/0x260
[ 1133.716667]  ? up+0x12/0x60
[ 1133.716669]  ? skb_get_poff+0x4b/0xa0
[ 1133.716674]  ? __kmalloc_reserve.isra.47+0x2e/0x80
[ 1133.716675]  skb_get_poff+0x4b/0xa0
[ 1133.716680]  bpf_skb_get_pay_offset+0xa/0x10
[ 1133.716686]  ? test_bpf_init+0x578/0x1000 [test_bpf]
[ 1133.716690]  ? netlink_broadcast_filtered+0x153/0x3d0
[ 1133.716695]  ? free_pcppages_bulk+0x324/0x600
[ 1133.716696]  ? 0xffffffffa0279000
[ 1133.716699]  ? do_one_initcall+0x46/0x1bd
[ 1133.716704]  ? kmem_cache_alloc_trace+0x144/0x1a0
[ 1133.716709]  ? do_init_module+0x5b/0x209
[ 1133.716712]  ? load_module+0x2136/0x25d0
[ 1133.716715]  ? __do_sys_finit_module+0xba/0xe0
[ 1133.716717]  ? __do_sys_finit_module+0xba/0xe0
[ 1133.716719]  ? do_syscall_64+0x48/0x100
[ 1133.716724]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch fixes tes_bpf by using init_net in the dummy dev.

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:09:45 +02:00
Andrey Ignatov 53d6eb08e9 bpftool: Fix bpftool net output
Print `bpftool net` output to stdout instead of stderr. Only errors
should be printed to stderr. Regular output should go to stdout and this
is what all other subcommands of bpftool do, including --json and
--pretty formats of `bpftool net` itself.

Fixes: commit f6f3bac08f ("tools/bpf: bpftool: add net support")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-27 21:07:44 +02:00
Maciej Żenczykowski 1042caa79e net-ipv4: remove 2 always zero parameters from ipv4_redirect()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
Maciej Żenczykowski d888f39666 net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
Maxime Chevallier da58a931f2 net: mvneta: Add support for 2500Mbps SGMII
The mvneta controller can handle speeds up to 2500Mbps on the SGMII
interface. This relies on serdes configuration, the lane must be
configured at 3.125Gbps and we can't use in-band autoneg at that speed.

The main issue when supporting that speed on this particular controller
is that the link partner can send ethernet frames with a shortened
preamble, which if not explicitly enabled in the controller will cause
unexpected behaviours.

This was tested on Armada 385, with the comphy configuration done in
bootloader.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:27:09 -07:00
David S. Miller c09c1474d8 Merge branch 'net-vhost-improve-performance-when-enable-busyloop'
Tonghao Zhang says:

====================
net: vhost: improve performance when enable busyloop

This patches improve the guest receive performance.
On the handle_tx side, we poll the sock receive queue
at the same time. handle_rx do that in the same way.

For more performance report, see patch 4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:55 -07:00
Tonghao Zhang 441abde4cd net: vhost: add rx busy polling in tx path
This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.

Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.

netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM:
* Without the patch:  19842.95 Mbps, 6.50 us mean latency
* With the patch:     37598.20 Mbps, 3.43 us mean latency

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:55 -07:00
Tonghao Zhang dc151282bb net: vhost: factor out busy polling logic to vhost_net_busy_poll()
Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:55 -07:00
Tonghao Zhang a6a67a2f34 net: vhost: replace magic number of lock annotation
Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:55 -07:00
Tonghao Zhang 78139c94dc net: vhost: lock the vqs one by one
This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:54 -07:00
Yafang Shao af4325ecc2 tcp: expose sk_state in tcp_retransmit_skb tracepoint
After sk_state exposed, we can get in which state this retransmission
occurs. That could give us more detail for dignostic.
For example, if this retransmission occurs in SYN_SENT state, it may
also indicates that the syn packet may be dropped on the remote peer due
to syn backlog queue full and then we could check the remote peer.

BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:07:19 -07:00
YueHaibing 0a71515665 net: faraday: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:18:08 -07:00
YueHaibing 6323d57f33 net: smsc: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:15:17 -07:00
zhong jiang 880e1b2111 net: liquidio: list usage cleanup
Trival cleanup, list_move_tail will implement the same function that
list_del() + list_add_tail() will do. hence just replace them.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:12:10 -07:00
zhong jiang 631e871edc net: qed: list usage cleanup
Trival cleanup, list_move_tail will implement the same function that
list_del() + list_add_tail() will do. hence just replace them.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:11:36 -07:00
David S. Miller 30b0594a3e Merge branch 'net-bridge-convert-bool-options-to-bits'
Nikolay Aleksandrov says:

====================
net: bridge: convert bool options to bits

A lot of boolean bridge options have been added around the net_bridge
structure resulting in holes and more importantly different cache lines
that need to be fetched in the fast path. This set moves all of those
to bits in a bitfield which resides in a hot cache line thus reducing
the size of net_bridge, the number of holes and the number of cache
lines needed for the fast path.
The set is also sent in preparation for new boolean options to avoid
spreading them in the structure and making new holes.
One nice side-effect is that we avoid potential race conditions by using
the bitops since some of the options were bits being directly set in
parallel risking hard to debug issues (has_ipv6_addr).

Before:
 size: 1184, holes: 8, sum holes: 30
After:
 size: 1160, holes: 3, sum holes: 7

Patch 01 is a trivial style fix
Patch 02 adds the new options bitfield and converts the vlan boolean
         options to bits
Patches 03-08 convert the rest of the boolean options to bits
Patch 09 re-arranges a few fields in net_bridge to further reduce size

v2: patch 09: remove the comment about offload_fwd_mark in net_bridge and
    leave it where it is now, thanks to Ido for spotting it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov 35750b0bca net: bridge: pack net_bridge better
Further reduce the size of net_bridge with 8 bytes and reduce the number of
holes in it:
 Before: holes: 5, sum holes: 15
 After: holes: 3, sum holes: 7

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov 3341d91702 net: bridge: convert mtu_set_by_user to a bit
Convert the last remaining bool option to a bit thus reducing the overall
net_bridge size further by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov c69c2cd444 net: bridge: convert neigh_suppress_enabled option to a bit
Convert the neigh_suppress_enabled option to a bit.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov 675779adbf net: bridge: convert mcast options to bits
This patch converts the rest of the mcast options to bits. It also packs
the mcast options a little better by moving multicast_mld_version to an
existing hole, reducing the net_bridge size by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov 13cefad2f2 net: bridge: convert and rename mcast disabled
Convert mcast disabled to an option bit and while doing so convert the
logic to check if multicast is enabled instead. That is make the logic
follow the option value - if it's set then mcast is enabled and vice versa.
This avoids a few confusing places where we inverted the value that's being
set to follow the mcast_disabled logic.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov be3664a038 net: bridge: convert group_addr_set option to a bit
Convert group_addr_set internal bridge opt to a bit.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov 8df3510f28 net: bridge: convert nf call options to bits
No functional change, convert of nf_call_[ip|ip6|arp]tables to bits.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov ae75767ec2 net: bridge: add bitfield for options and convert vlan opts
Bridge options have usually been added as separate fields all over the
net_bridge struct taking up space and ending up in different cache lines.
Let's move them to a single bitfield to save up space and speedup lookups.
This patch adds a simple API for option modifying and retrieving using
bitops and converts the first user of the API - the bridge vlan options
(vlan_enabled and vlan_stats_enabled).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:22 -07:00
Nikolay Aleksandrov 1c1cb6d032 net: bridge: make struct opening bracket consistent
Currently we have a mix of opening brackets on new lines and on the same
line, let's move them all on the same line.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:22 -07:00
David S. Miller 37ac5db6e6 Merge branch 's390-net-next'
Julian Wiedmann says:

====================
s390/net: updates 2018-09-26

please apply one more series of cleanups and small improvements for qeth
to net-next. Note that one patch needs to touch both af_iucv and qeth, in
order to untangle their receive paths.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:08 -07:00
Julian Wiedmann 91cc98f51e s390/qeth: remove duplicated carrier state tracking
The netdevice is always available, apply any carrier state changes to it
without caching them.
On a STARTLAN event (ie. carrier-up), defer updating the state to
qeth_core_hardsetup_card() in the subsequent recovery action.

Also remove the carrier-state checks from the xmit routines. Stopping
transmission on carrier-down is the responsibility of upper-level code
(eg see dev_direct_xmit()).

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:08 -07:00
Julian Wiedmann d782d80f36 s390/qeth: clean up drop conditions for received cmds
If qeth_check_ipa_data() consumed an event, there's no point in
processing it further. So drop it early, and make the surrounding code
a tiny bit more readable.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann d19b93f40e s390/qeth: re-indent qeth_check_ipa_data()
Pull one level of checking up into qeth_send_control_data_cb(), and
clean up an else-after-return. No functional change.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 68bba11643 s390/qeth: consume local address events
We have no code that is waiting for these events, so just drop them when
they arrive.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 6585ac4e5d s390/qeth: remove various redundant code
1. tracing iob->rc makes no sense when it hasn't been modified by the
   callback,
2. the qeth_dbf_list is declared with LIST_HEAD, which also initializes
   the list,
3. the ccwgroup core only calls the thaw/restore callbacks if the gdev
   is online, so we don't have to check for it again.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 8d908eb045 s390/qeth: remove CARD_FROM_CDEV helper
The cdev-to-card translation walks through two layers of drvdata,
with no locking or refcounting (where eg. the ccwgroup core only
accesses a cdev's drvdata while holding the ccwlock).

This might be safe for now, but any careless usage of the helper has the
potential for subtle races and use-after-free's. Luckily there's only
one occurrence where we _really_ need it (in qeth_irq()), for any other
user we can just pass through an appropriate card pointer.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 8f6637b878 s390/qeth: pass card pointer in iob callback
This allows us to remove the CARD_FROM_CDEV calls in the iob callbacks.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 6a3123d076 s390/qeth: re-use qeth_notify_skbs()
When not using the CQ, this allows us avoid the second skb queue walk
in qeth_release_skbs().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann 5a5312bdba s390/qeth: remove additional skb refcount
This was presumably left over from back when qeth recursed into
dev_queue_xmit().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann dc149e3764 s390/qeth: replace open-coded skb_queue_walk()
To match the use of __skb_queue_purge(), also make the skb's enqueue in
qeth_fill_buffer() lockless.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann cd11d11286 net/af_iucv: locate IUCV header via skb_network_header()
This patch attempts to untangle the TX and RX code in qeth from
af_iucv's respective HiperTransport path:
On the TX side, pointing skb_network_header() at the IUCV header
means that qeth_l3_fill_af_iucv_hdr() no longer needs a magical offset
to access the header.
On the RX side, qeth pulls the (fake) L2 header off the skb like any
normal ethernet driver would. This makes working with the IUCV header
in af_iucv easier, since we no longer have to assume a fixed skb layout.

While at it, replace the open-coded length checks in af_iucv's RX path
with pskb_may_pull().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Julian Wiedmann a2eb0ad50c s390/qeth: on gdev release, reset drvdata
qeth_core_probe_device() sets the gdev's drvdata, but doesn't reset it
on a subsequent error. Move the (re-)setting around a bit, so that it
happens symmetrically on allocating/freeing the qeth_card struct.

This is no actual problem, as the ccwgroup core will discard the gdev
on a probe error. But from qeth's perspective the gdev is an external
resource, so it's best to manage it cleanly.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00