Fix incorrect netdev reference count in xsk_bind operation. Incorrect
reference count of the device appears when a user calls bind with the
XDP_ZEROCOPY flag on an interface which does not support zero-copy.
In such a case, an error is returned but the reference count is not
decreased. This change fixes the fault, by decreasing the reference count
in case of such an error.
The problem being corrected appeared in '162c820ed896' for the first time,
and the code was moved to new file location over the time with commit
'c2d3d6a47462'. This specific patch applies to all version starting
from 'c2d3d6a47462'. The same solution should be applied but on different
file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions
from '162c820ed896' to 'c2d3d6a47462' excluded.
Fixes: 162c820ed8 ("xdp: hold device for umem regardless of zero-copy mode")
Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20201120151443.105903-1-marekx.majtyka@intel.com
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl+3q+gTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqYdjB/46Qv583QBIbM8YaXN+XTSfxSufp0Ku
G0R88mJKdOalgeiD58TaZ3hAegjOW44nCMmZP64mQB7XNvRiyNk8j69VDsRy+4Zy
r86Hi0TIlpeM5oNHb84SpvIO01elmuYNlaoU9t7NQV5YlKA2Lq+kcmMLGa7Z1zVk
IZ2KF2eF2V83p/4A5t1cgQV6LRlDBgAeoe/9pyG0mNX3Fv8lI3WIqjRkuevSm6iJ
XMEuc7kjUZ8Yd59Vvnipm89teD5lCyIpxeJ2hrOsiUlkwyosdCxewMeohJ596a2L
t5DoiT+3nJLOBQ/IKCYe5RoCf7sFZ8nfjqcNYeOm7Yw1CbXg9KwGZTU7
=BedD
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.11-20201120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2020-11-20
The first patch is by Yegor Yefremov and he improves the j1939 documentaton by
adding tables for the CAN identifier and its fields.
Then there are 8 patches by Oliver Hartkopp targeting the CAN driver
infrastructure and drivers. These add support for optional DLC element to the
Classical CAN frame structure. See patch ea7800565a ("can: add optional DLC
element to Classical CAN frame structure") for details. Oliver's last patch
adds len8_dlc support to several drivers. Stefan Mätje provides a patch to add
len8_dlc support to the esd_usb2 driver.
The next patch is by Oliver Hartkopp, too and adds support for modification of
Classical CAN DLCs to CAN GW sockets.
The next 3 patches target the nxp,flexcan DT bindings. One patch by my adds the
missing uint32 reference to the clock-frequency property. Joakim Zhang's
patches fix the fsl,clk-source property and add the IMX_SC_R_CAN() macro to the
imx firmware header file, which will be used in the flexcan driver later.
Another patch by Joakim Zhang prepares the flexcan driver for SCU based
stop-mode, by giving the existing, GPR based stop-mode, a _GPR postfix.
The next 5 patches are by me, target the flexcan driver, and clean up the
.ndo_open and .ndo_stop callbacks. These patches try to fix a sporadically
hanging flexcan_close() during simultanious ifdown, sending of CAN messages and
probably open CAN bus. I was never able to reproduce, but these seem to fix the
problem at the reporting user. As these changes are rather big, I'd like to
mainline them via net-next/master.
The next patches are by Jimmy Assarsson and Christer Beskow, they add support
for new USB devices to the existing kvaser_usb driver.
The last patch is by Kaixu Xia and simplifies the return in the
mcp251xfd_chip_softreset() function in the mcp251xfd driver.
* tag 'linux-can-next-for-5.11-20201120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (25 commits)
can: mcp251xfd: remove useless code in mcp251xfd_chip_softreset
can: kvaser_usb: Add new Kvaser hydra devices
can: kvaser_usb: kvaser_usb_hydra: Add support for new device variant
can: kvaser_usb: Add new Kvaser Leaf v2 devices
can: kvaser_usb: Add USB_{LEAF,HYDRA}_PRODUCT_ID_END defines
can: flexcan: flexcan_close(): change order if commands to properly shut down the controller
can: flexcan: flexcan_open(): completely initialize controller before requesting IRQ
can: flexcan: flexcan_rx_offload_setup(): factor out mailbox and rx-offload setup into separate function
can: flexcan: move enabling/disabling of interrupts from flexcan_chip_{start,stop}() to callers
can: flexcan: factor out enabling and disabling of interrupts into separate function
can: flexcan: rename macro FLEXCAN_QUIRK_SETUP_STOP_MODE -> FLEXCAN_QUIRK_SETUP_STOP_MODE_GPR
dt-bindings: firmware: add IMX_SC_R_CAN(x) macro for CAN
dt-bindings: can: fsl,flexcan: fix fsl,clk-source property
dt-bindings: can: fsl,flexcan: add uint32 reference to clock-frequency property
can: gw: support modification of Classical CAN DLCs
can: drivers: add len8_dlc support for esd_usb2 CAN adapter
can: drivers: add len8_dlc support for various CAN adapters
can: drivers: introduce helpers to access Classical CAN DLC values
can: update documentation for DLC usage in Classical CAN
can: rename CAN FD related can_len2dlc and can_dlc2len helpers
...
====================
Link: https://lore.kernel.org/r/20201120133318.3428231-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*),
instead of the PF_IUCV protocol that the parent was created with in
iucv_sock_create().
We're currently not using sk->sk_protocol ourselves, so this shouldn't
have much impact (except eg. getting the output in skb_dump() right).
Fixes: eac3731bd0 ("[S390]: Add AF_IUCV socket support")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use netdev->tstats instead of a member of net_bridge for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting congestion control via a BPF program it is seen that the
SYN/ACK for packets within a given flow will not include the ECT0 flag. A
bit of simple printk debugging shows that when this is configured without
BPF we will see the value INET_ECN_xmit value initialized in
tcp_assign_congestion_control however when we configure this via BPF the
socket is in the closed state and as such it isn't configured, and I do not
see it being initialized when we transition the socket into the listen
state. The result of this is that the ECT0 bit is configured based on
whatever the default state is for the socket.
Any easy way to reproduce this is to monitor the following with tcpdump:
tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca
Without this patch the SYN/ACK will follow whatever the default is. If dctcp
all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0
will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK
bit matches the value seen on the other packets in the given stream.
Fixes: 91b5b21c7c ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.
This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.
One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.
Fixes: ac8f1710c1 ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Send timely MPTCP-level ack is somewhat difficult when
the insertion into the msk receive level is performed
by the worker.
It needs TCP-level dup-ack to notify the MPTCP-level
ack_seq increase, as both the TCP-level ack seq and the
rcv window are unchanged.
We can actually avoid processing incoming data with the
worker, and let the subflow or recevmsg() send ack as needed.
When recvmsg() moves the skbs inside the msk receive queue,
the msk space is still unchanged, so tcp_cleanup_rbuf() could
end-up skipping TCP-level ack generation. Anyway, when
__mptcp_move_skbs() is invoked, a known amount of bytes is
going to be consumed soon: we update rcv wnd computation taking
them in account.
Additionally we need to explicitly trigger tcp_cleanup_rbuf()
when recvmsg() consumes a significant amount of the receive buffer.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
OoO handling attempts to detect when packet is out-of-window by testing
current ack sequence and remaining space vs. sequence number.
This doesn't work reliably. Store the highest allowed sequence number
that we've announced and use it to detect oow packets.
Do this when mptcp options get written to the packet (wire format).
For this to work we need to move the write_options call until after
stack selected a new tcp window.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets.
It will not fit when other MPTCP suboptions are included in this packet,
e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only
ADD_ADDR suboption, no other MPTCP suboptions.
In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR.
If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call
mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet.
In mptcp_established_options_add_addr, we check whether this is a pure
ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions
in this packet, only put ADD_ADDR suboption in it.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch changed the 'add_addr_signal' type from bool to char, so that
we could encode the addr type there.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will simplify all operation dealing with subflows
before accept time (e.g. data fin processing, add_addr).
The join list is already flushed by mptcp_stream_accept()
before returning the newly created msk to the user space.
This also fixes an potential bug present into the old code:
conn_list was manipulated without helding the msk lock
in mptcp_stream_accept().
Tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case a subflow path is blocked, MPTCP-level retransmit may not take
place anymore because such subflow is likely to have unacked data left
in its write queue.
Ignore subflows that have experienced loss and test next candidate.
Fixes: 3b1d6210a9 ("mptcp: implement and use MPTCP-level retransmission")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to cope with some more state transition for
fallback sockets, or could still end-up moving to TCP_CLOSE
too early and avoid spooling some pending data
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Only mptcp_close() can actually cancel the workqueue,
no need to add and use this flag.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If dsa_switch_ops::port_txtstamp() returns false, clone will be freed
immediately. Shouldn't store a pointer to freed memory.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space.
The flow that triggers the race is simple. Server sends small
amount of data right after the connection is configured to use TLS
and closes the connection. In this case receiver sees TLS Handshake
data, configures TLS socket right after Change Cipher Spec record.
While the configuration is in process, TCP socket receives small
Application Data record, Encrypted Alert record and FIN packet. So
the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
SK_DONE bit set. The received data is not parsed upon arrival and is
never sent to user-space.
Patch unpauses parser directly if we have unparsed data in tcp
receive queue.
Fixes: fcf4793e27 ("tls: check RCV_SHUTDOWN in tls_wait_data")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rose_send_frame() dereferences `neigh->dev` when called from
rose_transmit_clear_request(), and the first occurrence of the
`neigh` is in rose_loopback_timer() as `rose_loopback_neigh`,
and it is initialized in rose_add_loopback_neigh() as NULL.
i.e when `rose_loopback_neigh` used in rose_loopback_timer()
its `->dev` was still NULL and rose_loopback_timer() was calling
rose_rx_call_request() without checking for NULL.
- net/rose/rose_link.c
This bug seems to get triggered in this line:
rose_call = (ax25_address *)neigh->dev->dev_addr;
Fix it by adding NULL checking for `rose_loopback_neigh->dev`
in rose_loopback_timer().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com
Tested-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=9d2a7ca8c7f2e4b682c97578dfa3f236258300b3
Signed-off-by: Anmol Karn <anmol.karan123@gmail.com>
Link: https://lore.kernel.org/r/20201119191043.28813-1-anmol.karan123@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The static checker is fooled by the non-static locking scheme
implemented by the mentioned helpers.
Let's make its life easier adding some unconditional annotation
so that the helpers are now interpreted as a plain spinlock from
sparse.
v1 -> v2:
- add __releases() annotation to unlock_sock_fast()
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/6ed7ae627d8271fb7f20e0a9c6750fbba1ac2635.1605634911.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no easy way to distinguish if a conntracked tcp packet is
marked invalid because of tcp_in_window() check error or because
it doesn't belong to an existing connection. With this patch,
openvswitch sets liberal tcp flag for the established sessions so
that out of window packets are not marked invalid.
A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
sets this flag for both the directions of the nf_conn.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a bug that is triggered when a partially setup socket is
destroyed. For a fully setup socket, a socket that has been bound to a
device, the cleanup of the umem is performed at the end of the buffer
pool's cleanup work queue item. This has to be performed in a work
queue, and not in RCU cleanup, as it is doing a vunmap that cannot
execute in interrupt context. However, when a socket has only been
partially set up so that a umem has been created but the buffer pool
has not, the code erroneously directly calls the umem cleanup function
instead of using a work queue, and this leads to a BUG_ON() in
vunmap().
As there in this case is no buffer pool, we cannot use its work queue,
so we need to introduce a work queue for the umem and schedule this for
the cleanup. So in the case there is no pool, we are going to use the
umem's own work queue to schedule the cleanup. But if there is a
pool, the cleanup of the umem is still being performed by the pool's
work queue, as it is important that the umem is cleaned up after the
pool.
Fixes: e5e1a4bc91 ("xsk: Fix possible memory leak at socket close")
Reported-by: Marek Majtyka <marekx.majtyka@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Marek Majtyka <marekx.majtyka@intel.com>
Link: https://lore.kernel.org/bpf/1605873219-21629-1-git-send-email-magnus.karlsson@gmail.com
Add support for data length code modifications for Classical CAN.
The netlink configuration interface always allowed to pass any value
that fits into a byte, therefore only the modification process had to be
extended to handle the raw DLC represenation of Classical CAN frames.
When a DLC value from 0 .. F is provided for Classical CAN frame
modifications the 'len' value is modified as-is with the exception that
potentially existing 9 .. F DLC values in the len8_dlc element are moved
to the 'len' element for the modification operation by mod_retrieve_ccdlc().
After the modification the Classical CAN frame DLC information is brought
back into the correct format by mod_store_ccdlc() which is filling 'len'
and 'len8_dlc' accordingly.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201119084921.2621-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The naming of can_dlc as element of struct can_frame and also as variable
name is misleading as it claims to be a 'data length CODE' but in reality
it always was a plain data length.
With the indroduction of a new 'len' element in struct can_frame we can now
remove can_dlc as name and make clear which of the former uses was a plain
length (-> 'len') or a data length code (-> 'dlc') value.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201120100444.3199-1-socketcan@hartkopp.net
[mkl: gs_usb: keep struct gs_host_frame::can_dlc as is]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reset MHI device channels when driver remove is called due to
module unload or any crash scenario. This will make sure that
MHI channels no longer remain enabled for transfers since the
MHI stack does not take care of this anymore after the auto-start
channels feature was removed.
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
This patch adds an IPv6 routes encapsulation attribute
to the result of netlink RTM_GETROUTE requests
(i.e. ip route get 2001:db8::).
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201118230651.GA8861@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We must start the retransmission timer only there are
pending data in the rtx queue.
Otherwise we can hit a WARN_ON in mptcp_reset_timer(),
as syzbot demonstrated.
Reported-and-tested-by: syzbot+42aa53dafb66a07e5a24@syzkaller.appspotmail.com
Fixes: d9ca1de8c0 ("mptcp: move page frag allocation in mptcp_sendmsg()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/r/1a72039f112cae048c44d398ffa14e0a1432db3d.1605737083.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing a flash update via devlink, device drivers may inform
user space of status updates via
devlink_flash_update_(begin|end|timeout|status)_notify functions.
It is expected that drivers do not send any status notifications unless
they send a begin and end message. If a driver sends a status
notification without sending the appropriate end notification upon
finishing (regardless of success or failure), the current implementation
of the devlink userspace program can get stuck endlessly waiting for the
end notification that will never come.
The current ice driver implementation may send such a status message
without the appropriate end notification in rare cases.
Fixing the ice driver is relatively simple: we just need to send the
begin_notify at the start of the function and always send an end_notify
no matter how the function exits.
Rather than assuming driver authors will always get this right in the
future, lets just fix the API so that it is not possible to get wrong.
Make devlink_flash_update_begin_notify and
devlink_flash_update_end_notify static, and call them in devlink.c core
code. Always send the begin_notify just before calling the driver's
flash_update routine. Always send the end_notify just after the routine
returns regardless of success or failure.
Doing this makes the status notification easier to use from the driver,
as it no longer needs to worry about catching failures and cleaning up
by calling devlink_flash_update_end_notify. It is now no longer possible
to do the wrong thing in this regard. We also save a couple of lines of
code in each driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers which implement the devlink flash update support, with the
exception of netdevsim, use either request_firmware or
request_firmware_direct to locate the firmware file. Rather than having
each driver do this separately as part of its .flash_update
implementation, perform the request_firmware within net/core/devlink.c
Replace the file_name parameter in the struct devlink_flash_update_params
with a pointer to the fw object.
Use request_firmware rather than request_firmware_direct. Although most
Linux distributions today do not have the fallback mechanism
implemented, only about half the drivers used the _direct request, as
compared to the generic request_firmware. In the event that
a distribution does support the fallback mechanism, the devlink flash
update ought to be able to use it to provide the firmware contents. For
distributions which do not support the fallback userspace mechanism,
there should be essentially no difference between request_firmware and
request_firmware_direct.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
1) libbpf should not attempt to load unused subprogs, from Andrii.
2) Make strncpy_from_user() mask out bytes after NUL terminator, from Daniel.
3) Relax return code check for subprograms in the BPF verifier, from Dmitrii.
4) Fix several sockmap issues, from John.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
fail_function: Remove a redundant mutex unlock
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
libbpf: Fix VERSIONED_SYM_COUNT number parsing
bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list
bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self
bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
bpf, sockmap: Use truesize with sk_rmem_schedule()
bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect
bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made
selftests/bpf: Fix error return code in run_getsockopt_test()
bpf: Relax return code check for subprograms
tools, bpftool: Add missing close before bpftool net attach exit
MAINTAINERS/bpf: Update Andrii's entry.
selftests/bpf: Fix unused attribute usage in subprogs_unused test
bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id
bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare
libbpf: Don't attempt to load unused subprog as an entry-point BPF program
====================
Link: https://lore.kernel.org/r/20201119200721.288-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sparse complaints 3 times about:
net/smc/smc_ib.c:203:52: warning: incorrect type in argument 1 (different address spaces)
net/smc/smc_ib.c:203:52: expected struct net_device const *dev
net/smc/smc_ib.c:203:52: got struct net_device [noderef] __rcu *const ndev
Fix that by using the existing and validated ndev variable instead of
accessing attr->ndev directly.
Fixes: 5102eca903 ("net/smc: Use rdma_read_gid_l2_fields to L2 fields")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the multi-subnet support of SMC-Dv2 the match for existing link
groups should not include the vlanid of the network device.
Set ini->smcd_version accordingly before the call to smc_conn_create()
and use this value in smc_conn_create() to skip the vlanid check.
Fixes: 5c21c4ccaf ("net/smc: determine accepted ISM devices")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPV6=m
NF_DEFRAG_IPV6=y
ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function
`nf_ct_frag6_gather':
net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to
`ipv6_frag_thdr_truncated'
Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This
dependency is forcing IPV6=y.
Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This
is the same solution as used with a similar issues: Referring to
commit 70b095c843 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6
module")
Fixes: 9d9e937b1c ("ipv6/netfilter: Discard first fragment not including all headers")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from
br_vlan_stats not defining an alignment requirement), therefore
switch to using the latter one.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
push_scqe() uses in_interrupt() to figure out if it is allowed to sleep.
The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.
Aside of that in_interrupt() is not correct as it does not catch preempt
disabled regions which neither can sleep.
ns_send() (the only caller of push_scqe()) has the following callers:
- vcc_sendmsg() used as proto_ops::sendmsg is expected to be invoked in
preemtible context.
-> vcc->dev->ops->send() (ns_send())
- atm_vcc::send via atmdev_ops::send either directly (pointer copied by
atm_init_aal34() or atm_init_aal5()) or via atm_send_aal0().
This is invoked by drivers (like br2684, clip, pppoatm, ...) which are
called from net_device_ops::ndo_start_xmit with BH disabled.
Add atmdev_ops::send_bh which is used by callers from BH context
(atm_send_aal*()) and if this callback missing then ::send is used
instead.
Implement this callback in nicstar and use it to replace in_interrupt().
Cc: Chas Williams <3chas3@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DSA network devices rely on having their DSA management interface up and
running otherwise their ndo_open() will return -ENETDOWN. Without doing
this it would not be possible to use DSA devices as netconsole when
configured on the command line. These devices also do not utilize the
upper/lower linking so the check about the netpoll device having upper
is not going to be a problem.
The solution adopted here is identical to the one done for
net/ipv4/ipconfig.c with 728c02089a ("net: ipv4: handle DSA enabled
master network devices"), with the network namespace scope being
restricted to that of the process configuring netpoll.
Fixes: 04ff53f96a ("net: dsa: Add netconsole support")
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Start MHI device channels so that transfers can be performed.
The MHI stack does not auto-start channels anymore.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Checking for ifdef CONFIG_x fails if CONFIG_x=m.
Use IS_ENABLED instead, which is true for both built-ins and modules.
Otherwise, a
> ip -4 route add 1.2.3.4/32 via inet6 fe80::2 dev eth1
fails with the message "Error: IPv6 support not enabled in kernel." if
CONFIG_IPV6 is `m`.
In the spirit of b8127113d0.
Fixes: d15662682d ("ipv4: Allow ipv6 gateway with ipv4 routes")
Cc: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Florian Klink <flokli@flokli.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201115224509.2020651-1-flokli@flokli.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nlmsg_cancel() needs to be called in the error path of
inet_req_diag_fill to cancel the message.
Fixes: d545caca82 ("net: inet: diag: expose the socket mark to privileged processes.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20201116082018.16496-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When skb has a frag_list its possible for skb_to_sgvec() to fail. This
happens when the scatterlist has fewer elements to store pages than would
be needed for the initial skb plus any of its frags.
This case appears rare, but is possible when running an RX parser/verdict
programs exposed to the internet. Currently, when this happens we throw
an error, break the pipe, and kfree the msg. This effectively breaks the
application or forces it to do a retry.
Lets catch this case and handle it by doing an skb_linearize() on any
skb we receive with frags. At this point skb_to_sgvec should not fail
because the failing conditions would require frags to be in place.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556576837.73229.14800682790808797635.stgit@john-XPS-13-9370
If the skb_verdict_prog redirects an skb knowingly to itself, fix your
BPF program this is not optimal and an abuse of the API please use
SK_PASS. That said there may be cases, such as socket load balancing,
where picking the socket is hashed based or otherwise picks the same
socket it was received on in some rare cases. If this happens we don't
want to confuse userspace giving them an EAGAIN error if we can avoid
it.
To avoid double accounting in these cases. At the moment even if the
skb has already been charged against the sockets rcvbuf and forward
alloc we check it again and do set_owner_r() causing it to be orphaned
and recharged. For one this is useless work, but more importantly we
can have a case where the skb could be put on the ingress queue, but
because we are under memory pressure we return EAGAIN. The trouble
here is the skb has already been accounted for so any rcvbuf checks
include the memory associated with the packet already. This rolls
up and can result in unnecessary EAGAIN errors in userspace read()
calls.
Fix by doing an unlikely check and skipping checks if skb->sk == sk.
Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370
If a socket redirects to itself and it is under memory pressure it is
possible to get a socket stuck so that recv() returns EAGAIN and the
socket can not advance for some time. This happens because when
redirecting a skb to the same socket we received the skb on we first
check if it is OK to enqueue the skb on the receiving socket by checking
memory limits. But, if the skb is itself the object holding the memory
needed to enqueue the skb we will keep retrying from kernel side
and always fail with EAGAIN. Then userspace will get a recv() EAGAIN
error if there are no skbs in the psock ingress queue. This will continue
until either some skbs get kfree'd causing the memory pressure to
reduce far enough that we can enqueue the pending packet or the
socket is destroyed. In some cases its possible to get a socket
stuck for a noticeable amount of time if the socket is only receiving
skbs from sk_skb verdict programs. To reproduce I make the socket
memory limits ridiculously low so sockets are always under memory
pressure. More often though if under memory pressure it looks like
a spurious EAGAIN error on user space side causing userspace to retry
and typically enough has moved on the memory side that it works.
To fix skip memory checks and skb_orphan if receiving on the same
sock as already assigned.
For SK_PASS cases this is easy, its always the same socket so we
can just omit the orphan/set_owner pair.
For backlog cases we need to check skb->sk and decide if the orphan
and set_owner pair are needed.
Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
We use skb->size with sk_rmem_scheduled() which is not correct. Instead
use truesize to align with socket and tcp stack usage of sk_rmem_schedule.
Suggested-by: Daniel Borkman <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556570616.73229.17003722112077507863.stgit@john-XPS-13-9370
Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This
allows users to tune SO_RCVBUF and sockmap will honor them.
We can refactor the if(charge) case out in later patches. But, keep this
fix to the point.
Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556568657.73229.8404601585878439060.stgit@john-XPS-13-9370
If copy_page_to_iter() fails or even partially completes, but with fewer
bytes copied than expected we currently reset sg.start and return EFAULT.
This proves problematic if we already copied data into the user buffer
before we return an error. Because we leave the copied data in the user
buffer and fail to unwind the scatterlist so kernel side believes data
has been copied and user side believes data has _not_ been received.
Expected behavior should be to return number of bytes copied and then
on the next read we need to return the error assuming its still there. This
can happen if we have a copy length spanning multiple scatterlist elements
and one or more complete before the error is hit.
The error is rare enough though that my normal testing with server side
programs, such as nginx, httpd, envoy, etc., I have never seen this. The
only reliable way to reproduce that I've found is to stream movies over
my browser for a day or so and wait for it to hang. Not very scientific,
but with a few extra WARN_ON()s in the code the bug was obvious.
When we review the errors from copy_page_to_iter() it seems we are hitting
a page fault from copy_page_to_iter_iovec() where the code checks
fault_in_pages_writeable(buf, copy) where buf is the user buffer. It
also seems typical server applications don't hit this case.
The other way to try and reproduce this is run the sockmap selftest tool
test_sockmap with data verification enabled, but it doesn't reproduce the
fault. Perhaps we can trigger this case artificially somehow from the
test tools. I haven't sorted out a way to do that yet though.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370
In async_resync mode, we log the TCP seq of records until the async request
is completed. Later, in case one of the logged seqs matches the resync
request, we return it, together with its record serial number. Before this
fix, we mistakenly returned the serial number of the current record
instead.
Fixes: ed9b7646b0 ("net/tls: Add asynchronous resync")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some identifiers have different names between their prototypes
and the kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DLCI driver (dlci.c) implements the Frame Relay protocol. However,
we already have another newer and better implementation of Frame Relay
provided by the HDLC_FR driver (hdlc_fr.c).
The DLCI driver's implementation of Frame Relay is used by only one
hardware driver in the kernel - the SDLA driver (sdla.c).
The SDLA driver provides Frame Relay support for the Sangoma S50x devices.
However, the vendor provides their own driver (along with their own
multi-WAN-protocol implementations including Frame Relay), called WANPIPE.
I believe most users of the hardware would use the vendor-provided WANPIPE
driver instead.
(The WANPIPE driver was even once in the kernel, but was deleted in
commit 8db60bcf30 ("[WAN]: Remove broken and unmaintained Sangoma
drivers.") because the vendor no longer updated the in-kernel WANPIPE
driver.)
Cc: Mike McLagan <mike.mclagan@linux.org>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201114150921.685594-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce batched descriptor interfaces in the xsk core code for the
Tx path to be used in the driver to write a code path with higher
performance. This interface will be used by the i40e driver in the
next patch. Though other drivers would likely benefit from this new
interface too.
Note that batching is only implemented for the common case when
there is only one socket bound to the same device and queue id. When
this is not the case, we fall back to the old non-batched version of
the function.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-5-git-send-email-magnus.karlsson@gmail.com
Introduce one cache line worth of padding between the consumer pointer
and the flags field as well as between the flags field and the start
of the descriptors in all the lockless rings. This so that the x86 HW
adjacency prefetcher will not prefetch the adjacent pointer/field when
only one pointer/field is going to be used. This improves throughput
performance for the l2fwd sample app with 1% on my machine with HW
prefetching turned on in the BIOS.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-4-git-send-email-magnus.karlsson@gmail.com
During loss recovery, retransmitted packets are forced to use TCP
timestamps to calculate the RTT samples, which have a millisecond
granularity. BBR is designed using a microsecond granularity. As a
result, multiple RTT samples could be truncated to the same RTT value
during loss recovery. This is problematic, as BBR will not enter
PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning
that if there are persistent losses, PROBE_RTT will constantly be
pushed off and potentially never re-entered. This patch makes sure
that BBR enters PROBE_RTT by checking if RTT sample is < the current
min_rtt sample, rather than <=.
The Netflix transport/TCP team discovered this bug in the Linux TCP
BBR code during lab tests.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Ryan Sharpelletti <sharpelletti@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use a consistent style of one-line/multi-line comments throughout the
file.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.
This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.
While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.
This is a port of:
1ed9ec9b08 ("dsa: Allow forwarding of redirected IGMP traffic")
Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If tcp socket has more data than Encrypted Handshake Message then
tls_sw_recvmsg will try to decrypt next record instead of returning
full control message to userspace as mentioned in comment. The next
message - usually Application Data - gets corrupted because it uses
zero copy for decryption that's why the data is not stored in skb
for next iteration. Revert check to not decrypt next record if
current is not Application Data.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The intention of the current check is to avoid using bpf_sk_storage
in irq and nmi. Jakub pointed out that the current check cannot
do that. For example, in_serving_softirq() returns true
if the softirq handling is interrupted by hard irq.
Fixes: 8e4597c627 ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com
In br_forward.c and br_input.c fields dev->stats.tx_dropped and
dev->stats.multicast are populated, but they are ignored in
ndo_get_stats64.
Fixes: 28172739f0 ("net: fix 64 bit counters on 32 bit arches")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the worker moves some bytes from the OoO queue into
the receive queue, the msk->ask_seq is updated, the MPTCP-level
ack carrying that value needs to wait the next ingress packet,
possibly slowing down or hanging the peer
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before sending 'x' new bytes also check that the new snd_una would
be within the permitted receive window.
For every ACK that also contains a DSS ack, check whether its tcp-level
receive window would advance the current mptcp window right edge and
update it if so.
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
least one subflow and the mptcp socket itself are writeable.
mptcp_poll returns EPOLLOUT if the bit is set.
mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
has used up all subflows or the mptcp socket wmem.
This reworks nospace handling as follows:
MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
This bit is set when the mptcp socket is not writeable.
The mptcp-level ack path schedule will then schedule the mptcp worker
to allow it to free already-acked data (and reduce wmem usage).
This will then wake userspace processes that wait for a POLLOUT event.
sendmsg will set MPTCP_NOSPACE only when it has to wait for more
wmem (blocking I/O case).
poll path will set MPTCP_NOSPACE in case the mptcp socket is
not writeable.
Normal tcp-level notification (SOCK_NOSPACE) is only enabled
in case the subflow socket has no available wmem.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the previous patch we may end-up with unsent data
in the write buffer. If such buffer is full, the writer
will block for unlimited time.
We need to trigger the MPTCP xmit path even for the
subflow rx path, on MPTCP snd_una updates.
Keep things simple and just schedule the work queue if
needed.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_sendmsg() is refactored so that first it copies
the data provided from user space into the send queue,
and then tries to spool the send queue via sendmsg_frag.
There a subtle change in the mptcp level collapsing on
consecutive data fragment: we now allow that only on unsent
data.
The latter don't need to deal with msghdr data anymore
and can be simplified in a relevant way.
snd_nxt and write_seq are now tracked independently.
Overall this allows some relevant cleanup and will
allow sending pending mptcp data on msk una update in
later patch.
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We must not close the subflows before all the MPTCP level
data, comprising the DATA_FIN has been acked at the MPTCP
level, otherwise we could be unable to retransmit as needed.
__mptcp_wr_shutdown() shutdown is responsible to check for the
correct status and close all subflows. Is called by the output
path after spooling any data and at shutdown/close time.
In a similar way, __mptcp_destroy_sock() is responsible to clean-up
the MPTCP level status, and is called when the msk transition
to TCP_CLOSE.
The protocol level close() does not force anymore the TCP_CLOSE
status, but orphan the msk socket and all the subflows.
Orphaned msk sockets are forciby closed after a timeout or
when all MPTCP-level data is acked.
There is a caveat about keeping the orphaned subflows around:
the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
tcp_close(). To prevent accessing freed memory on later MPTCP
level operations, the msk acquires a reference to each subflow
socket and prevent subflow_ulp_release() from releasing the
subflow context before __mptcp_destroy_sock().
The additional subflow references are released by __mptcp_done()
and the async ULP release is detected checking ULP ops. If such
field has been already cleared by the ULP release path, the
dangling context is freed directly by __mptcp_done().
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Track the next MPTCP sequence number used on xmit,
currently always equal to write_next.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Preparation patch to track the data pending in the msk
write queue. No functional change introduced here
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current argument list is pretty long and quite unreadable,
move many of them into a specific struct. Later patches
will add more stuff to such struct.
Additionally drop the 'timeo' argument, now unused.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
remove some of code duplications an allow preventing
rescheduling on close.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
unlocked version of protocol level close, will be used by
MPTCP to allow decouple orphaning and subflow level close.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_push_pending() is called even on orphaned
msk (and orphaned subflows), if there is outstanding
data at close() time.
To cope with the above MPTCP needs to handle explicitly
the allocation failure on xmit. The newly introduced
do_tcp_sendfrag() allows that, just plug it.
We can additionally drop a couple of sanity checks,
duplicate in the TCP code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Will be needed by the next patch, as MPTCP needs to handle
directly the error/memory-allocation-needed path.
No functional changes intended.
Additionally let MPTCP code access the tcp_remove_empty_skb()
helper.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Packets are processed even though the first fragment don't include all
headers through the upper layer header. This breaks TAHI IPv6 Core
Conformance Test v6LC.1.3.6.
Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
all headers through an Upper-Layer header, then that fragment should be
discarded and an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero."
The fragment needs to be validated the same way it is done in
commit 2efdaaaf88 ("IPv6: reply ICMP error if the first fragment don't
include all headers") for ipv6. Wrap the validation into a common function,
ipv6_frag_thdr_truncated() to check for truncation in the upper layer
header. This validation does not fullfill all aspects of RFC 8200,
section 4.5, but is at the moment sufficient to pass mentioned TAHI test.
In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
let ipv6_frag_thdr_truncated() start it's traverse from the fragment
header.
Return 0 to drop the fragment in the netfilter. This is the same behaviour
as used on other protocol errors in this function, e.g. when
nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
appropriate ICMP Parameter Problem message back to the source.
References commit 2efdaaaf88 ("IPv6: reply ICMP error if the first
fragment don't include all headers")
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.
For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.
The callers of nla_strlcpy were modified to take into account this modification.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl+xZoMTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqX8lB/4gQq9MTczYjSxVGgqaEyToftX7rokl
EiP6RcpXKvAY8p0IJHEWHeCGOqsD5am4xo6M/Z5nMAinBdywweSvQRdU8ZDdCb84
AxJEbrkU7JTNqHbQnLp3wzl5sHPAwvRavf9plFwnPosMM2nS0HcPnjaj9kgO3cLI
E/fMa+WmPwDjMEIBM2TJCgFh9pH5Fs8I3SRHC/TUYwigXKVtgIuIm1o6t4evBk9w
zBW+5HX++r2FSt0164vVZyBCOCK/z2tEMCEO8l3mrUKdjBy3jAFqXfWxcQbaFy8f
Oy8Nx1vgNZYVH7RqYJ6JuQO6/wQIPAWb6KOgDrDWaM6jl3ZdA7/gBdiR
=Ed6S
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.10-20201115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-15
Anant Thazhemadam contributed two patches for the AF_CAN that prevent potential
access of uninitialized member in can_rcv() and canfd_rcv().
The next patch is by Alejandro Concepcion Rodriguez and changes can_restart()
to use the correct function to push a skb into the networking stack from
process context.
Zhang Qilong's patch fixes a memory leak in the error path of the ti_hecc's
probe function.
A patch by me fixes mcba_usb_start_xmit() function in the mcba_usb driver, to
first fill the skb and then pass it to can_put_echo_skb().
Colin Ian King's patch fixes a potential integer overflow on shift in the
peak_usb driver.
The next two patches target the flexcan driver, a patch by me adds the missing
"req_bit" to the stop mode property comment (which was broken during net-next
for v5.10). Zhang Qilong's patch fixes the failure handling of
pm_runtime_get_sync().
The next seven patches target the m_can driver including the tcan4x5x spi
driver glue code. Enric Balletbo i Serra's patch for the tcan4x5x Kconfig fix
the REGMAP_SPI dependency handling. A patch by me for the tcan4x5x driver's
probe() function adds missing error handling to for devm_regmap_init(), and in
tcan4x5x_can_remove() the order of deregistration is fixed. Wu Bo's patch for
the m_can driver fixes the state change handling in
m_can_handle_state_change(). Two patches by Dan Murphy first introduce
m_can_class_free_dev() and then make use of it to fix the freeing of the can
device. A patch by Faiz Abbas add a missing shutdown of the CAN controller in
the m_can_stop() function.
* tag 'linux-can-fixes-for-5.10-20201115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: m_can_stop(): set device to software init mode before closing
can: m_can: Fix freeing of can device from peripherials
can: m_can: m_can_class_free_dev(): introduce new function
can: m_can: m_can_handle_state_change(): fix state change
can: tcan4x5x: tcan4x5x_can_remove(): fix order of deregistration
can: tcan4x5x: tcan4x5x_can_probe(): add missing error checking for devm_regmap_init()
can: tcan4x5x: replace depends on REGMAP_SPI with depends on SPI
can: flexcan: fix failure handling of pm_runtime_get_sync()
can: flexcan: flexcan_setup_stop_mode(): add missing "req_bit" to stop mode property comment
can: peak_usb: fix potential integer overflow on shift of a int
can: mcba_usb: mcba_usb_start_xmit(): first fill skb, then pass to can_put_echo_skb()
can: ti_hecc: Fix memleak in ti_hecc_probe
can: dev: can_restart(): post buffer from the right context
can: af_can: prevent potential access of uninitialized member in canfd_rcv()
can: af_can: prevent potential access of uninitialized member in can_rcv()
====================
Link: https://lore.kernel.org/r/20201115174131.2089251-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In canfd_rcv(), cfd->len is uninitialized when skb->len = 0, and this
uninitialized cfd->len is accessed nonetheless by pr_warn_once().
Fix this uninitialized variable access by checking cfd->len's validity
condition (cfd->len > CANFD_MAX_DLEN) separately after the skb->len's
condition is checked, and appropriately modify the log messages that
are generated as well.
In case either of the required conditions fail, the skb is freed and
NET_RX_DROP is returned, same as before.
Fixes: d468984688 ("can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once")
Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com
Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201103213906.24219-3-anant.thazhemadam@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In can_rcv(), cfd->len is uninitialized when skb->len = 0, and this
uninitialized cfd->len is accessed nonetheless by pr_warn_once().
Fix this uninitialized variable access by checking cfd->len's validity
condition (cfd->len > CAN_MAX_DLEN) separately after the skb->len's
condition is checked, and appropriately modify the log messages that
are generated as well.
In case either of the required conditions fail, the skb is freed and
NET_RX_DROP is returned, same as before.
Fixes: 8cb68751c1 ("can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once")
Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com
Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201103213906.24219-2-anant.thazhemadam@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If THIS_MODULE is not set, the module would be removed while debugfs is
being used.
It eventually makes kernel panic.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Commit d3fd65484c ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.
Use this function instead of own code.
While on it, remove xfrmi_get_stats64() and replace it with
dev_get_tstats64().
Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215939.147007-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d3fd65484c ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.
Use this function instead of own code.
While on it, remove internal_get_stats() and replace it
with dev_get_tstats64().
Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(),
msg should be canceled by genlmsg_cancel().
Fixes: df38dafd25 ("devlink: implement shared buffer occupancy monitoring interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both IPv4 and IPv6 needs it via a function pointer.
Following patch will avoid the indirect call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds an IPv4 routes encapsulation attribute
to the result of netlink RTM_GETROUTE requests
(e.g. ip route get 192.0.2.1).
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113085517.GA1307262@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Static checking revealed that a previous fix to
netlbl_unlabel_staticlist() leaves a stack variable uninitialized,
this patches fixes that.
Fixes: 866358ec33 ("netlabel: fix our progress tracking in netlbl_unlabel_staticlist()")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/160530304068.15651.18355773009751195447.stgit@sifl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit bdb7cc643f ("ipv6: Count interface receive statistics on the
ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since
then, ipv6_skb_idev() is unused and make CC=clang W=1 warns:
net/ipv6/exthdrs.c:909:33:
warning: unused function 'ipv6_skb_idev' [-Wunused-function]
So, remove this unused function and a -Wunused-function warning.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A call trace was found in Hangbin's Codenomicon testing with debug kernel:
[ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp]
[ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0
[ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0
[ 2616.191533] Call Trace:
[ 2616.194265] <IRQ>
[ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0
[ 2616.207336] slab_free_freelist_hook+0xeb/0x140
[ 2616.220971] kfree+0xd6/0x2c0
[ 2616.224293] rcu_do_batch+0x3bd/0xc70
[ 2616.243096] rcu_core+0x8b9/0xd00
[ 2616.256065] __do_softirq+0x23d/0xacd
[ 2616.260166] irq_exit+0x236/0x2a0
[ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620
[ 2616.269138] apic_timer_interrupt+0xf/0x20
[ 2616.273711] </IRQ>
This is because it holds asoc when transport->proto_unreach_timer starts
and puts asoc when the timer stops, and without holding transport the
transport could be freed when the timer is still running.
So fix it by holding/putting transport instead for proto_unreach_timer
in transport, just like other timers in transport.
v1->v2:
- Also use sctp_transport_put() for the "out_unlock:" path in
sctp_generate_proto_unreach_event(), as Marcelo noticed.
Fixes: 50b5d6ad63 ("sctp: Fix a race between ICMP protocol unreachable and connect()")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before commit c0cfa2d8a7 ("vsock: add multi-transports support"),
if a G2H transport was loaded (e.g. virtio transport), every packets
was forwarded to the host, regardless of the destination CID.
The H2G transports implemented until then (vhost-vsock, VMCI) always
responded with an error, if the destination CID was not
VMADDR_CID_HOST.
From that commit, we are using the remote CID to decide which
transport to use, so packets with remote CID > VMADDR_CID_HOST(2)
are sent only through H2G transport. If no H2G is available, packets
are discarded directly in the guest.
Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour
to implement sibling VMs communication, so we restore the old
behavior when no H2G is registered.
It will be up to the host to discard packets if the destination is
not the right one. As it was already implemented before adding
multi-transport support.
Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra.
[1] Documentation/virt/ne_overview.rst
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Dexuan Cui <decui@microsoft.com>
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reported-by: Andra Paraschiv <andraprs@amazon.com>
Tested-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-11-14
1) Add BTF generation for kernel modules and extend BTF infra in kernel
e.g. support for split BTF loading and validation, from Andrii Nakryiko.
2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
on inlined branch conditions, from Alexei Starovoitov.
3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.
4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
infra, from Martin KaFai Lau.
5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
XDP_REDIRECT path, from Lorenzo Bianconi.
6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
context, from Song Liu.
7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
for different target archs on same source tree, from Jean-Philippe Brucker.
8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.
9) Move functionality from test_tcpbpf_user into the test_progs framework so it
can run in BPF CI, from Alexander Duyck.
10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.
Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.
[0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
net: mlx5: Add xdp tx return bulking support
net: mvpp2: Add xdp tx return bulking support
net: mvneta: Add xdp tx return bulking support
net: page_pool: Add bulk support for ptr_ring
net: xdp: Introduce bulking for xdp tx return path
bpf: Expose bpf_d_path helper to sleepable LSM hooks
bpf: Augment the set of sleepable LSM hooks
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Rename some functions in bpf_sk_storage
bpf: Folding omem_charge() into sk_storage_charge()
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
selftests/bpf: Add skb_pkt_end test
bpf: Support for pointers beyond pkt_end.
tools/bpf: Always run the *-clean recipes
tools/bpf: Add bootstrap/ to .gitignore
bpf: Fix NULL dereference in bpf_task_storage
tools/bpftool: Fix build slowdown
tools/runqslower: Build bpftool using HOSTCC
tools/runqslower: Enable out-of-tree build
...
====================
Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
genlmsg_cancel() needs to be called in the error path of
inet6_fill_ifmcaddr and inet6_fill_ifacaddr to cancel
the message.
Fixes: 6ecf4c37eb ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201112080950.1476302-1-zhangqilong3@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
Commit 58956317c8 ("neighbor: Improve garbage collection")
guarantees neighbour table entries a five-second lifetime. Processes
which make heavy use of multicast can fill the neighour table with
multicast addresses in five seconds. At that point, neighbour entries
can't be GC-ed because they aren't five seconds old yet, the kernel
log starts to fill up with "neighbor table overflow!" messages, and
sends start to fail.
This patch allows multicast addresses to be thrown out before they've
lived out their five seconds. This makes room for non-multicast
addresses and makes messages to all addresses more reliable in these
circumstances.
Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace strncpy() with strscpy(), fixes the following warning:
In function 'bearer_name_validate',
inlined from 'tipc_enable_bearer' at net/tipc/bearer.c:246:7:
net/tipc/bearer.c:141:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
strncpy(name_copy, name, TIPC_MAX_BEARER_NAME);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Link: https://lore.kernel.org/r/20201112093442.8132-1-wenlin.kang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* injection/radiotap updates for new test capabilities
* remove WDS support - even years ago when we turned
it off by default it was already basically unusable
* support for HE (802.11ax) rates for beacons
* support for some vendor-specific HE rates
* many other small features/cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl+uSqIACgkQB8qZga/f
l8QBOw/6AwlcQWMjqdb6H/QRORA81E4tX2+alHbeBai7KSI+9E1Jtakmn5qKQ4iH
IjpNWPsclj4zKhgbKaariIn/bZEk8OhzmDpssnHTMpuo3iuCmuzFaDdZd9Uun2Ad
tr3bqfHaom1MhWRF/FuBSHcnk599qRnsk+RY7/6dhjiPlWOWJvsfpuo1KblVoFWU
wYDX+W2oYDAx44O/6AGJ0Zctwf6m7Kyzb2aMIqv2fwacBoDvyVdTIT/4NroV9INI
QvIY4Gi8hoCDQX39zwaxSWOq7uFLYHwUozzZxktS5c4N3eSVFs80jmdiQiMKmKRQ
A+R+ZcuFBcC+6+Wt4x+20T2mF6pUvSaIDA4jegCbDL4jQlp+023XTMlV42cnpP0z
hFZgBWJszLnLtj4KW/v3sXefZ1Pxl0WD4BHNqz8SMzMUaWalrXP4Gt2bnjB7Bx1N
2M/DjW570eNZeZ9ZFcvkwHysCWMzHKmh5sPXnOitrs4s2hweIrO7wnMlYVLAGF1J
m8jUoqpI9Cc7dFEg0inaSIddcjobcx9i2eG14zaZnXj0t8WqAbQqI0Lw/mipWXFY
7DfdjFULI+Yru46TAFbiisFo/2dlijxrIr3d3QK21Cwklb3BPhpiDf83q6HYhNpB
xPs38OCZaNdSL7TwNRcuZ2jmBCf+48SYgse85HQOgdD2QzJv6dU=
=TGgF
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Some updates:
* injection/radiotap updates for new test capabilities
* remove WDS support - even years ago when we turned
it off by default it was already basically unusable
* support for HE (802.11ax) rates for beacons
* support for some vendor-specific HE rates
* many other small features/cleanups
* tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits)
nl80211: fix kernel-doc warning in the new SAE attribute
cfg80211: remove WDS code
mac80211: remove WDS-related code
rt2x00: remove WDS code
b43legacy: remove WDS code
b43: remove WDS code
carl9170: remove WDS code
ath9k: remove WDS code
wireless: remove CONFIG_WIRELESS_WDS
mac80211: assure that certain drivers adhere to DONT_REORDER flag
mac80211: don't overwrite QoS TID of injected frames
mac80211: adhere to Tx control flag that prevents frame reordering
mac80211: add radiotap flag to assure frames are not reordered
mac80211: save HE oper info in BSS config for mesh
cfg80211: add support to configure HE MCS for beacon rate
nl80211: fix beacon tx rate mask validation
nl80211/cfg80211: fix potential infinite loop
cfg80211: Add support to calculate and report 4096-QAM HE rates
cfg80211: Add support to configure SAE PWE value to drivers
ieee80211: Add definition for WFA DPP
...
====================
Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* a use-after-free fix in rfkill
* a memory leak fix in the mac80211 TX status path
* some rate scaling fixes
* a fix for the often-reported (by syzbot) sleeping
in atomic issue with mac80211's station removal
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl+uSaYACgkQB8qZga/f
l8Q83A/8DaRC1M1nuhnPeMXJd73qxPk58+WqmBJU8sQEGtyCHyQZ6EdHUgW/V4ww
Yh4vYBOoiWGG6JC0N0G02O4SxGmk3hZyTdxJ8WxnsM7pJ4yzWufduJrYvnx59mbh
muw+WgEYLdyWYyzn81/rmfvQKfeFiic4pwSBuru2o0Vhb2bNYeUpYIMmdHkzlEyV
BEwdVGCSvKWuxXdGCbplAJ79MbN6jP03y1+u7ZLYz7P+OvZZVQqUFouNuHUfrLUs
nS8sgRToFTDt1Hu9GagRvbvqpwXGFgH+4LpMwLLa4mFjPcCC2UPewbiSIk7EU4tU
wCr9pRlQXmcD+yVuWvwEQJvHQgkzm/Krx517QteXYB/GLmNC0A43kyQK+CEDlk56
Fj0hk3UmXmek2EBEI/jLURlsJr7QMWqRRXXVK2VIvEButP+iWmPPyZ01Tkfjz5IN
jHl8ahiSqX3kcCZLldlvkmtwqKKtgE4coQnPVOJJRBKYX5YpwLm0rClTsvx6XMe2
zsegduwv6il9v3ezNudfdkcPtPfptOrkkvHgBfw6VgmvxQA1iuC8XDt/5lTpw/lt
hshBRFvKpBTp0x52nkvBs7b7qO8qM55hEZF7NISJOyLbO1VAJPt4zZEFukip+zow
QgA556OfDrA5bHDmuYIChBkJFOKeTI7J8DYQy65vloowg6OpkYg=
=r2Wn
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-net-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A handful of fixes:
* a use-after-free fix in rfkill
* a memory leak fix in the mac80211 TX status path
* some rate scaling fixes
* a fix for the often-reported (by syzbot) sleeping
in atomic issue with mac80211's station removal
* tag 'mac80211-for-net-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
mac80211: free sta in sta_info_insert_finish() on errors
mac80211: minstrel: fix tx status processing corner case
mac80211: minstrel: remove deferred sampling code
mac80211: fix memory leak on filtered powersave frames
rfkill: Fix use-after-free in rfkill_resume()
====================
Link: https://lore.kernel.org/r/20201113093421.24025-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
the bpf_sk_storage_(get|delete) helper, so those tracing programs
can access the sk's bpf_local_storage and the later selftest
will show some examples.
The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
cg sockops...etc which is running either in softirq or
task context.
This patch adds bpf_sk_storage_get_tracing_proto and
bpf_sk_storage_delete_tracing_proto. They will check
in runtime that the helpers can only be called when serving
softirq or running in a task context. That should enable
most common tracing use cases on sk.
During the load time, the new tracing_allowed() function
will ensure the tracing prog using the bpf_sk_storage_(get|delete)
helper is not tracing any bpf_sk_storage*() function itself.
The sk is passed as "void *" when calling into bpf_local_storage.
This patch only allows tracing a kernel function.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
Rename some of the functions currently prefixed with sk_storage
to bpf_sk_storage. That will make the next patch have fewer
prefix check and also bring the bpf_sk_storage.c to a more
consistent function naming.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211307.2587021-1-kafai@fb.com
sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com
The x25_disconnect function in x25_subr.c would decrease the refcount of
"x25->neighbour" (struct x25_neigh) and reset this pointer to NULL.
However, the x25_rx_call_request function in af_x25.c, which is called
when we receive a connection request, does not increase the refcount when
it assigns the pointer.
Fix this issue by increasing the refcount of "struct x25_neigh" in
x25_rx_call_request.
This patch fixes frequent kernel crashes when using AF_X25 sockets.
Fixes: 4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201112103506.5875-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a user unbinds and re-binds a NC-SI aware driver the kernel will
attempt to register the netlink interface at runtime. The structure is
marked __ro_after_init so registration fails spectacularly at this point.
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/bind
ftgmac100 1e660000.ethernet: Read MAC address 52:54:00:12:34:56 from chip
ftgmac100 1e660000.ethernet: Using NCSI interface
8<--- cut here ---
Unable to handle kernel paging request at virtual address 80a8f858
pgd = 8c768dd6
[80a8f858] *pgd=80a0841e(bad)
Internal error: Oops: 80d [#1] SMP ARM
CPU: 0 PID: 116 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00003-gdd25b227ec1e #51
Hardware name: Generic DT based system
PC is at genl_register_family+0x1f8/0x6d4
LR is at 0xff26ffff
pc : [<8073f930>] lr : [<ff26ffff>] psr: 20000153
sp : 8553bc80 ip : 81406244 fp : 8553bd04
r10: 8085d12c r9 : 80a8f73c r8 : 85739000
r7 : 00000017 r6 : 80a8f860 r5 : 80c8ab98 r4 : 80a8f858
r3 : 00000000 r2 : 00000000 r1 : 81406130 r0 : 00000017
Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
Control: 00c5387d Table: 85524008 DAC: 00000051
Process sh (pid: 116, stack limit = 0x1f1988d6)
...
Backtrace:
[<8073f738>] (genl_register_family) from [<80860ac0>] (ncsi_init_netlink+0x20/0x48)
r10:8085d12c r9:80c8fb0c r8:85739000 r7:00000000 r6:81218000 r5:85739000
r4:8121c000
[<80860aa0>] (ncsi_init_netlink) from [<8085d740>] (ncsi_register_dev+0x1b0/0x210)
r5:8121c400 r4:8121c000
[<8085d590>] (ncsi_register_dev) from [<805a8060>] (ftgmac100_probe+0x6e0/0x778)
r10:00000004 r9:80950228 r8:8115bc10 r7:8115ab00 r6:9eae2c24 r5:813b6f88
r4:85739000
[<805a7980>] (ftgmac100_probe) from [<805355ec>] (platform_drv_probe+0x58/0xa8)
r9:80c76bb0 r8:00000000 r7:80cd4974 r6:80c76bb0 r5:8115bc10 r4:00000000
[<80535594>] (platform_drv_probe) from [<80532d58>] (really_probe+0x204/0x514)
r7:80cd4974 r6:00000000 r5:80cd4868 r4:8115bc10
Jakub pointed out that ncsi_register_dev is obviously broken, because
there is only one family so it would never work if there was more than
one ncsi netdev.
Fix the crash by registering the netlink family once on boot, and drop
the code to unregister it.
Fixes: 955dc68cb9 ("net/ncsi: Add generic netlink family")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Link: https://lore.kernel.org/r/20201112061210.914621-1-joel@jms.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When dumping the name and NTP servers advertised by DHCP, a blank line
is emitted if either of the lists is empty. This can lead to confusing
issues such as the blank line getting flagged as warning. This happens
because the blank line is the result of pr_cont("\n") and that may see
its level corrupted by some other driver concurrently writing to the
console.
Fix this by making sure that the terminating newline is only emitted
if at least one entry in the lists was printed before.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20201110073757.1284594-1-thierry.reding@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The initialization for 'err' with '-ENOSYS' is redundant and
can be removed, as it is updated soon and not used.
Changes since v1:
- Move the err declaration below struct sock *sk
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/5faa01d5.1c69fb81.8451c.cb5b@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC
Current release - bugs in new features:
- mptcp: provide rmem[0] limit offset to fix oops
Previous release - regressions:
- IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
calculations
- lan743x: correctly handle chips with internal PHY
- bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
- mlx5e: Fix VXLAN port table synchronization after function reload
Previous release - always broken:
- bpf: Zero-fill re-used per-cpu map element
- net: udp: fix out-of-order packets when forwarding with UDP GSO
fraglists turned on
- fix UDP header access on Fast/frag0 UDP GRO
- fix IP header access and skb lookup on Fast/frag0 UDP GRO
- ethtool: netlink: add missing netdev_features_change() call
- net: Update window_clamp if SOCK_RCVBUF is set
- igc: Fix returning wrong statistics
- ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
- tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
- r8169: disable hw csum for short packets on all chip versions
- vrf: Fix fast path output packet handling with async Netfilter rules
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+thbIACgkQMUZtbf5S
Irsy0RAAhYIYDNMSkQhcVcQPMxbtStwgTtKrWxg/D2zh3Kg+B4oRgoNZnt9kmlHX
Su/aRWbTWBkDIMxIWBfRsO3z5zSQm4yLG1FTlfsOcWzOJcsntCO8SzikyxtnEZK8
Bpi7dOoKB6KF0V2YjM9AHh5fbXvS7KJfp/PjZ7Kpn5BEbFV8rKtIyiJxwXXZUr6O
ddM9Om4i0zf+dmsY1HVEyowPQMVB3vbn8F3dPk3ZrD8NVa53NtvMRxHKSsourRbZ
yp4LKZV+POKHPFglO4jhLymhyeiwb1qgA8wssk7EKu0bwPeOcER4Tpewh1ib4C/C
sRRzj0Wlw6dyPCkyNKx23D7dF/DrnLmXLUBhGS2mu2htSlWOH6w6rFQoVSNGGy9T
DKUlUVUPG80mgYdME6NLJ27GOGQzxoAvzWgpcL6dJs9jz8nQqABJeXvdjw/vc/XH
AOaKy4VwE3qf0W106JpUb+a/q0RJf7w3o4c1vLc/AZwpshNBOsrJBqrTk2E5Nrhd
mcQykaF++DbLPIyTqhHl0GpKapohThESyMvfc4WRBFBaCwgFdOY/t0Gz3GA2N8Jc
fuq9NOB1bfouaFGfzdkZ7RZJi3lFqZfv/XiJCh/knp1/lHAQPo4TuADcFDsjeEc9
yr48SRDnCqahAQ7bUP0b5i31SZzwAYb/HnwYuvf4LWFvHl9XG5A=
=AKM7
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
ENETC
Current release - bugs in new features:
- mptcp: provide rmem[0] limit offset to fix oops
Previous release - regressions:
- IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
calculations
- lan743x: correctly handle chips with internal PHY
- bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
- mlx5e: Fix VXLAN port table synchronization after function reload
Previous release - always broken:
- bpf: Zero-fill re-used per-cpu map element
- fix out-of-order UDP packets when forwarding with UDP GSO fraglists
turned on:
- fix UDP header access on Fast/frag0 UDP GRO
- fix IP header access and skb lookup on Fast/frag0 UDP GRO
- ethtool: netlink: add missing netdev_features_change() call
- net: Update window_clamp if SOCK_RCVBUF is set
- igc: Fix returning wrong statistics
- ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
- tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
- r8169: disable hw csum for short packets on all chip versions
- vrf: Fix fast path output packet handling with async Netfilter
rules"
* tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
lan743x: fix use of uninitialized variable
net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
net: udp: fix UDP header access on Fast/frag0 UDP GRO
devlink: Avoid overwriting port attributes of registered port
vrf: Fix fast path output packet handling with async Netfilter rules
cosa: Add missing kfree in error path of cosa_write
net: switch to the kernel.org patchwork instance
ch_ktls: stop the txq if reaches threshold
ch_ktls: tcb update fails sometimes
ch_ktls/cxgb4: handle partial tag alone SKBs
ch_ktls: don't free skb before sending FIN
ch_ktls: packet handling prior to start marker
ch_ktls: Correction in middle record handling
ch_ktls: missing handling of header alone
ch_ktls: Correction in trimmed_len calculation
cxgb4/ch_ktls: creating skbs causes panic
ch_ktls: Update cheksum information
ch_ktls: Correction in finding correct length
cxgb4/ch_ktls: decrypted bit is not enough
net/x25: Fix null-ptr-deref in x25_connect
...
udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the
packet. While it's probably OK for non-frag0 paths, this helpers
will also point to junk on Fast/frag0 GRO when all headers are
located in frags. As a result, sk/skb lookup may fail or give wrong
results. To support both GRO modes, skb_gro_network_header() might
be used. To not modify original functions, add private versions of
udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO.
Present since the introduction of "application-level" UDP GRO
in 4.7-rc1.
Misc: replace totally unneeded ternaries with plain ifs.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Suggested-by: Willem de Bruijn <willemb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's
probably OK for non-frag0 paths (when all headers or even the entire
frame are already in skb head), this inline points to junk when
using Fast GRO (napi_gro_frags() or napi_gro_receive() with only
Ethernet header in skb head and all the rest in the frags) and breaks
GRO packet compilation and the packet flow itself.
To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
are typically used. UDP even has an inline helper that makes use of
them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
to get rid of the out-of-order delivers.
Present since the introduction of plain UDP GRO in 5.0-rc1.
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit in fixes tag overwrites the port attributes for the
registered port.
Avoid such error by checking registered flag before setting attributes.
Fixes: 71ad8d55f8 ("devlink: Replace devlink_port_attrs_set parameters with a struct")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some drivers fill the status rate list without setting the rate index after
the final rate to -1. minstrel_ht already deals with this, but minstrel
doesn't, which causes it to get stuck at the lowest rate on these drivers.
Fix this by checking the count as well.
Cc: stable@vger.kernel.org
Fixes: cccf129f82 ("mac80211: add the 'minstrel' rate control algorithm")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201111183359.43528-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Deferring sampling attempts to the second stage has some bad interactions
with drivers that process the rate table in hardware and use the probe flag
to indicate probing packets (e.g. most mt76 drivers). On affected drivers
it can lead to probing not working at all.
If the link conditions turn worse, it might not be such a good idea to
do a lot of sampling for lower rates in this case.
Fix this by simply skipping the sample attempt instead of deferring it,
but keep the checks that would allow it to be sampled if it was skipped
too often, but only if it has less than 95% success probability.
Also ensure that IEEE80211_TX_CTL_RATE_CTRL_PROBE is set for all probing
packets.
Cc: stable@vger.kernel.org
Fixes: cccf129f82 ("mac80211: add the 'minstrel' rate control algorithm")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201111183359.43528-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After the status rework, ieee80211_tx_status_ext is leaking un-acknowledged
packets for stations in powersave mode.
To fix this, move the code handling those packets from __ieee80211_tx_status
into ieee80211_tx_status_ext
Reported-by: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Fixes: 3318111cf6 ("mac80211: reduce duplication in tx status functions")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201111183359.43528-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a device is getting removed or reprobed during resume, use-after-free
might happen. For example, h5_btrtl_resume() schedules a work queue for
device reprobing, which of course requires removal first.
If the removal happens in parallel with the device_resume() and wins the
race to acquire device_lock(), removal may remove the device from the PM
lists and all, but device_resume() is already running and will continue
when the lock can be acquired, thus calling rfkill_resume().
During this, if rfkill_set_block() is then called after the corresponding
*_unregister() and kfree() are called, there will be an use-after-free
in hci_rfkill_set_block():
BUG: KASAN: use-after-free in hci_rfkill_set_block+0x58/0xc0 [bluetooth]
...
Call trace:
dump_backtrace+0x0/0x154
show_stack+0x20/0x2c
dump_stack+0xbc/0x12c
print_address_description+0x88/0x4b0
__kasan_report+0x144/0x168
kasan_report+0x10/0x18
check_memory_region+0x19c/0x1ac
__kasan_check_write+0x18/0x24
hci_rfkill_set_block+0x58/0xc0 [bluetooth]
rfkill_set_block+0x9c/0x120
rfkill_resume+0x34/0x70
dpm_run_callback+0xf0/0x1f4
device_resume+0x210/0x22c
Fix this by checking rfkill->registered in rfkill_resume(). device_del()
in rfkill_unregister() requires device_lock() and the whole rfkill_resume()
is also protected by the same lock via device_resume(), we can make sure
either the rfkill->registered is false before rfkill_resume() starts or the
rfkill device won't be unregistered before rfkill_resume() returns.
As async_resume() holds a reference to the device, at this level there can
be no use-after-free; only in the user that doesn't expect this scenario.
Fixes: 8589086f4e ("Bluetooth: hci_h5: Turn off RTL8723BS on suspend, reprobe on resume")
Signed-off-by: Claire Chang <tientzu@chromium.org>
Link: https://lore.kernel.org/r/20201110084908.219088-1-tientzu@chromium.org
[edit commit message for clarity and add more info provided later]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Be more consistent about the way in which the nexthop flags are set and
set them in one go.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201110102553.1924232-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The disable_policy and disable_xfrm are a per-interface sysctl to
disable IPsec policy or encryption on an interface. However, while a
"all" variant is exposed, it was a noop since it was never evaluated.
We use the usual "or" logic for this kind of sysctls.
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Distant QRTR nodes can be accessed via an other node that acts as
a bridge. When the a QRTR endpoint associated to a bridge node is
released, all the linked distant nodes should also be released.
This patch fixes endpoint release by:
- Submitting QRTR BYE message locally on behalf of all the nodes
accessible through the endpoint.
- Removing all the routable node IDs from radix tree pointing to
the released node endpoint.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will be requested for allocating control packet in atomic context.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to reach non-immediate remote node services that are
accessed through an intermediate node, the route to the remote
node needs to be saved.
E.g for a [node1 <=> node2 <=> node3] network
- node2 forwards node3 service to node1
- node1 must save node2 as route for reaching node3
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A remote endpoint (immediate neighbors node) can forward services
from other nodes (non-immadiate), in that case ctrl packet node ID
(offering distant service) can differ from the qrtr source node
(forwarding the packet).
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The port ID for control messages was uncorrectly set with broadcast
node ID value, causing message to be dropped on remote side since
not passing packet filtering (cb->dst_port != QRTR_PORT_CTRL).
Fixes: d27e77a3de ("net: qrtr: Reset the node and port ID of broadcast messages")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes a regression for blocking connects introduced by commit
4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect").
The x25->neighbour is already set to "NULL" by x25_disconnect() now,
while a blocking connect is waiting in
x25_wait_for_connection_establishment(). Therefore x25->neighbour must
not be accessed here again and x25->state is also already set to
X25_STATE_0 by x25_disconnect().
Fixes: 4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This replaces the BT_DBG function to bt_dev_dbg as it is cleaner to show
the controller index in the debug message.
Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
According to the spec Ver 5.2, Vol 3, Part C, Sec 5.2.2.8:
Device in security mode 4 level 4 shall enforce:
128-bit equivalent strength for link and encryption keys required
using FIPS approved algorithms (E0 not allowed, SAFER+ not allowed,
and P-192 not allowed; encryption key not shortened)
This patch rejects connection with key size below 16 for FIPS
level services.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
syzbot reported the following KASAN finding:
BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
Read of size 2 at addr 000000000000021e by task syz-executor907/519
CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135
[<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118
[<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383
[<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline]
[<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534
[<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
[<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457
[<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204
[<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline]
[<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210
[<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
There is nothing to shutdown if a connection has never been established.
Besides that iucv->hs_dev is not yet initialized if a socket is in
IUCV_OPEN state and iucv->path is not yet initialized if socket is in
IUCV_BOUND state.
So, just skip the shutdown calls for a socket in these states.
Fixes: eac3731bd0 ("[S390]: Add AF_IUCV socket support")
Fixes: 82492a355f ("af_iucv: add shutdown for HS transport")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: correct one Fixes tag]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The skb is needed only to fetch the keys for the lookup.
Both functions are used from GRO stack, we do not want
accidental modification of the skb.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.
Fixes: e88c64f0a4 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some typos are found out by misspell-fixer tool:
$ misspell-fixer -rnv ./net/sched/
./net/sched/act_api.c:686
./net/sched/act_bpf.c:68
./net/sched/cls_rsvp.h:241
./net/sched/em_cmp.c:44
./net/sched/sch_pie.c:408
Fix typos found by misspell-fixer.
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/5fa8e9d4.1c69fb81.5d889.5c64@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current NetLabel code doesn't correctly keep track of the netlink
dump state in some cases, in particular when multiple interfaces with
large configurations are loaded. The problem manifests itself by not
reporting the full configuration to userspace, even though it is
loaded and active in the kernel. This patch fixes this by ensuring
that the dump state is properly reset when necessary inside the
netlbl_unlabel_staticlist() function.
Fixes: 8cc44579d1 ("NetLabel: Introduce static network labels for unlabeled connections")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@sifl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The initialization for 'err' with '-EINVAL' is redundant and
can be removed, as it is updated soon.
Changes since v1:
- Remove redundant empty line
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20201108010541.12432-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to commit fda55eca5a
("net: introduce skb_transport_header_was_set()"), avoid resetting
transport offsets that were already set by GRO layer. This not only
mirrors the behavior of __netif_receive_skb_core(), but also makes
sense when it comes to UDP GSO fraglists forwarding: transport offset
of such skbs is set only once by GRO receive callback and remains
untouched and correct up to the xmitting driver in 1:1 case, but
becomes junk after untagging in ingress VLAN case and breaks UDP
GSO offload. This does not happen after this change, and all types
of forwarding of UDP GSO fraglists work as expected.
Since v1 [1]:
- keep the code 1:1 with __netif_receive_skb_core() (Jakub).
[1] https://lore.kernel.org/netdev/zYurwsZRN7BkqSoikWQLVqHyxz18h4LhHU4NFa2Vw@cp4-web-038.plabs.ch
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/7JgIkgEztzt0W6ZtC9V9Cnk5qfkrUFYcpN871syCi8@cp4-web-040.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After having migrated all users remove ip_tunnel_get_stats64().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace ip_tunnel_get_stats64() with the new identical core function
dev_get_tstats64().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace ip_tunnel_get_stats64() with the new identical core function
dev_get_tstats64().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Switch ip6_tunnel to the standard statistics pattern:
- use dev->stats for the less frequently accessed counters
- use dev->tstats for the frequently accessed counters
An additional benefit is that we now have 64bit statistics also on
32bit systems.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use netdev->tstats instead of a member of dsa_slave_priv for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's a frequent pattern to use netdev->stats for the less frequently
accessed counters and per-cpu counters for the frequently accessed
counters (rx/tx bytes/packets). Add a default ndo_get_stats64()
implementation for this use case.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mptcp proto struct currently does not provide the
required limit for forward memory scheduling. Under
pressure sk_rmem_schedule() will unconditionally try
to use such field and will oops.
Address the issue inheriting the tcp limit, as we already
do for the wmem one.
Fixes: 9c3f94e168 ("mptcp: add missing memory scheduling in the rx path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/37af798bd46f402fb7c79f57ebbdd00614f5d7fa.1604861097.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After updating userspace Ethtool from 5.7 to 5.9, I noticed that
NETDEV_FEAT_CHANGE is no more raised when changing netdev features
through Ethtool.
That's because the old Ethtool ioctl interface always calls
netdev_features_change() at the end of user request processing to
inform the kernel that our netdevice has some features changed, but
the new Netlink interface does not. Instead, it just notifies itself
with ETHTOOL_MSG_FEATURES_NTF.
Replace this ethtool_notify() call with netdev_features_change(), so
the kernel will be aware of any features changes, just like in case
with the ioctl interface. This does not omit Ethtool notifications,
as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops
ETHTOOL_MSG_FEATURES_NTF on it
(net/ethtool/netlink.c:ethnl_netdev_event()).
From v1 [1]:
- dropped extra new line as advised by Jakub;
- no functional changes.
[1] https://lore.kernel.org/netdev/AlZXQ2o5uuTVHCfNGOiGgJ8vJ3KgO5YIWAnQjH0cDE@cp3-web-009.plabs.ch
Fixes: 0980bfcd69 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/ahA2YWXYICz5rbUSQqNG4roJ8OlJzzYQX7PTiG80@cp4-web-028.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace list_head with hlist_head for MRP list under the bridge.
There is no need for a circular list when a linear list will work.
This will also decrease the size of 'struct net_bridge'.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20201106215049.1448185-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One use case of PACKET_FANOUT is lockless reception with one socket
per CPU. 256 is a practical limit on increasingly many machines.
Increase PACKET_FANOUT_MAX to 64K. Expand setsockopt PACKET_FANOUT to
take an extra argument max_num_members. Also explicitly define a
fanout_args struct, instead of implicitly casting to an integer. This
documents the API and simplifies the control flow.
If max_num_members is not specified or is set to 0, then 256 is used,
same as before.
Signed-off-by: Tanner Love <tannerlove@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianlin reports that a bridged IPv6 VXLAN endpoint, carrying IPv6
packets over a link with a PMTU estimation of exactly 1350 bytes,
won't trigger ICMPv6 Packet Too Big replies when the encapsulated
datagrams exceed said PMTU value. VXLAN over IPv6 adds 70 bytes of
overhead, so an ICMPv6 reply indicating 1280 bytes as inner MTU
would be legitimate and expected.
This comes from an off-by-one error I introduced in checks added
as part of commit 4cb47a8644 ("tunnels: PMTU discovery support
for directly bridged IP packets"), whose purpose was to prevent
sending ICMPv6 Packet Too Big messages with an MTU lower than the
smallest permissible IPv6 link MTU, i.e. 1280 bytes.
In iptunnel_pmtud_check_icmpv6(), avoid triggering a reply only if
the advertised MTU would be less than, and not equal to, 1280 bytes.
Also fix the analogous comparison for IPv4, that is, skip the ICMP
reply only if the resulting MTU is strictly less than 576 bytes.
This becomes apparent while running the net/pmtu.sh bridged VXLAN
or GENEVE selftests with adjusted lower-link MTU values. Using
e.g. GENEVE, setting ll_mtu to the values reported below, in the
test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() test
function, we can see failures on the following tests:
test | ll_mtu
-------------------------------|--------
pmtu_ipv4_br_geneve4_exception | 626
pmtu_ipv6_br_geneve4_exception | 1330
pmtu_ipv6_br_geneve6_exception | 1350
owing to the different tunneling overheads implied by the
corresponding configurations.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/r/4f5fc2f33bfdf8409549fafd4f952b008bf04d63.1604681709.git.sbrivio@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When udp_memory_allocated is at the limit, __udp_enqueue_schedule_skb
will return a -ENOBUFS, and skb will be dropped in __udp_queue_rcv_skb
without any counters being done. It's hard to find out what happened
once this happen.
So we introduce a UDP_MIB_MEMERRORS to do this job. Well, this change
looks friendly to the existing users, such as netstat:
$ netstat -u -s
Udp:
0 packets received
639 packets to unknown port received.
158689 packet receive errors
180022 packets sent
RcvbufErrors: 20930
MemErrors: 137759
UdpLite:
IpExt:
InOctets: 257426235
OutOctets: 257460598
InNoECTPkts: 181177
v2:
- Fix some alignment problems
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/1604627354-43207-1-git-send-email-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.
As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.
This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.
Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
refactoring.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl+pk7QVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+lwgQAL0WE92H1QJwYtrC5bXko1CjXjL7
I1lv/rMf1ZHhdbZLZQNSqXFYTGrO3w6n02H7bJcYlryg5YSt8i8evdJXICYyeZIX
5QAT0K5hzHTNWKnumqBSwoVOPl1e6ImZtmyxqQvA/2sQP18OPvroK/9H0YkdnM3/
d8lcpKTBCJj0UAWmktaXGYG8PdNSjaNXMfPRwpCOGHiXk+QBAb+QjshB54PKjjhR
aiJTJzceroLer0YlQSXfVQMt6EwkTkjCbMbxPywfFYGGvl/Y7H4YgVA8rYqO/XZr
BmP9V+xX87GyB0IEGxoheVcmTMUSw37JUfAC2oBQB9g2emG5avRn4vdhL25nKd1T
sgaVC+0tnoMQ7KNaYp1SK6orgS+OIYeQLhxbu6jmU+viccJ621JmpRF+95OwEZ9Z
4+vBwI3Oft20jndgNwrTvCLgkzEVFpJuayBeZCk7pvchM2YjaWwl291ix+cwM2wQ
fwMVs6dpLIgfB8jNOM6qAfI1jB1HMePrPraqxddxh5tZ0Tt4C4uwpEIDDwaPesmJ
FK3JB+7GpU/tMHmmaeVFUMGx9V+8fJFEC0MFUrrqAMZ3XbzQ+DM5ysk1TQsO0OEO
F1ojiYNW8s4U+dLCY0S16vFVoQIuM9Ui1zXGaJHQgS04l+cFCmD495s4HtYA1k7l
H/T/o416bZlbOhcK
=bpPt
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
refactoring"
* tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux:
net/sunrpc: fix useless comparison in proc_do_xprt()
net/sunrpc: return 0 on attempt to write to "transports"
NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy
NFSD: Fix use-after-free warning when doing inter-server copy
NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL
SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
NFSD: NFSv3 PATHCONF Reply is improperly formed
A different wait queue was used when removing ctrl_wait than when adding
it. This effectively made the remove operation without locking compared
to other operations on the wait queue ctrl_wait was part of. This caused
issues like below where dead000000000100 is LIST_POISON1 and
dead000000000200 is LIST_POISON2.
list_add corruption. next->prev should be prev (ffffffc1b0a33a08), \
but was dead000000000200. (next=ffffffc03ac77de0).
------------[ cut here ]------------
CPU: 3 PID: 2138 Comm: bluetoothd Tainted: G O 4.4.238+ #9
...
---[ end trace 0adc2158f0646eac ]---
Call trace:
[<ffffffc000443f78>] __list_add+0x38/0xb0
[<ffffffc0000f0d04>] add_wait_queue+0x4c/0x68
[<ffffffc00020eecc>] __pollwait+0xec/0x100
[<ffffffc000d1556c>] bt_sock_poll+0x74/0x200
[<ffffffc000bdb8a8>] sock_poll+0x110/0x128
[<ffffffc000210378>] do_sys_poll+0x220/0x480
[<ffffffc0002106f0>] SyS_poll+0x80/0x138
[<ffffffc00008510c>] __sys_trace_return+0x0/0x4
Unable to handle kernel paging request at virtual address dead000000000100
...
CPU: 4 PID: 5387 Comm: kworker/u15:3 Tainted: G W O 4.4.238+ #9
...
Call trace:
[<ffffffc0000f079c>] __wake_up_common+0x7c/0xa8
[<ffffffc0000f0818>] __wake_up+0x50/0x70
[<ffffffc000be11b0>] sock_def_wakeup+0x58/0x60
[<ffffffc000de5e10>] l2cap_sock_teardown_cb+0x200/0x224
[<ffffffc000d3f2ac>] l2cap_chan_del+0xa4/0x298
[<ffffffc000d45ea0>] l2cap_conn_del+0x118/0x198
[<ffffffc000d45f8c>] l2cap_disconn_cfm+0x6c/0x78
[<ffffffc000d29934>] hci_event_packet+0x564/0x2e30
[<ffffffc000d19b0c>] hci_rx_work+0x10c/0x360
[<ffffffc0000c2218>] process_one_work+0x268/0x460
[<ffffffc0000c2678>] worker_thread+0x268/0x480
[<ffffffc0000c94e0>] kthread+0x118/0x128
[<ffffffc000085070>] ret_from_fork+0x10/0x20
---[ end trace 0adc2158f0646ead ]---
Signed-off-by: Ole Bjørn Midtbø <omidtbo@cisco.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Avoid multiple attempts to create the debugfs entry, force_bredr_smp,
by moving it from the SMP registration to the BR/EDR controller init
section. hci_debugfs_create_bredr is only called when HCI_SETUP and
HCI_CONFIG is not set.
Signed-off-by: Claire Chang <tientzu@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When adding device to white list the device is added to resolving list
also. It has to be added only when HCI_ENABLE_LL_PRIVACY flag is set.
HCI_ENABLE_LL_PRIVACY flag has to be tested before adding/deleting devices
to resolving list. use_ll_privacy macro is used only to check if controller
supports LL_Privacy.
https://bugzilla.kernel.org/show_bug.cgi?id=209745
Fixes: 0eee35bdfa ("Bluetooth: Update resolving list when updating whitelist")
Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When an LE connection request is made, advertising is disabled and never
resumed. When a client has an active advertisement, this is disruptive.
This change adds resume logic for client-configured (non-directed)
advertisements after the connection attempt.
The patch was tested by registering an advertisement, initiating an LE
connection from a remote peer, and verifying that the advertisement is
re-activated after the connection is established. This is performed on
Hatch and Kukui Chromebooks.
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
AMP_MGR is getting derefernced in hci_phy_link_complete_evt(), when called
from hci_event_packet() and there is a possibility, that hcon->amp_mgr may
not be found when accessing after initialization of hcon.
- net/bluetooth/hci_event.c:4945
The bug seems to get triggered in this line:
bredr_hcon = hcon->amp_mgr->l2cap_conn->hcon;
Fix it by adding a NULL check for the hcon->amp_mgr before checking the ev-status.
Fixes: d5e911928b ("Bluetooth: AMP: Process Physical Link Complete evt")
Reported-and-tested-by: syzbot+0bef568258653cff272f@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=0bef568258653cff272f
Signed-off-by: Anmol Karn <anmol.karan123@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In the original code, the "if (*lenp < 0)" check didn't work because
"*lenp" is unsigned. Fortunately, the memory_read_from_buffer() call
will never fail in this context so it doesn't affect runtime.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-11-06
1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.
2) Tighten bpf_lsm function check, from KP.
3) Fix bpftool attaching to flow dissector, from Lorenz.
4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Update verification logic for LSM programs
bpf: Zero-fill re-used per-cpu map element
bpf: BPF_PRELOAD depends on BPF_SYSCALL
tools/bpftool: Fix attaching flow dissector
libbpf: Fix possible use after free in xsk_socket__delete
libbpf: Fix null dereference in xsk_socket__delete
libbpf, hashmap: Fix undefined behavior in hash_bits
bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
tools, bpftool: Remove two unused variables.
tools, bpftool: Avoid array index warnings.
xsk: Fix possible memory leak at socket close
bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
samples/bpf: Set rlimit for memlock to infinity in all samples
bpf: Fix -Wshadow warnings
selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================
Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
You can't write to this file because the permissions are 0444. But
it sort of looked like you could do a write and it would result in
a read. Then it looked like proc_sys_call_handler() just ignored
it. Which is confusing. It's more clear if the "write" just
returns zero.
Also, the "lenp" pointer is never NULL so that check can be removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
and netfilter subtrees.
Current release - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous release - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even
if inner packet has IP_DF (don't fragment) set in its header
(when TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break
gso support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous release - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need
to be dropped
- always clone the skbs to make sure they have a reference on
the socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+kTDcACgkQMUZtbf5S
IrtC9A//f9rwNFI7sRaz9FYi6ljtWY7paPxdOxy3pWRoNzbfffjTGSPheNvy1pQb
IPaLsNwRrckQNSEPTbQqlUYcjzk1W74ffvq0sQOan4kNKxjX3uf78E6RuWARJsRC
dLqfcJctO6bFi6sEMwIFZ2tLOO5lUIA+Pd0GbjhSdObWzl3uqJ26v7wC6vVk29vS
116Mmhe8/TDVtCOzwlZnBPHqBJkTAirB+MAEX4Sp6FB9YirlcNZbWyHX5L6ejGqC
WQVjU2tPBBugeo0j72tc+y0mD3iK0aLcPL+dk0EQQYHRDMVTebl+gxNPUXCo9Out
HGe5z4e4qrR4Rx1W6MQ3pKwTYuCdwKjMRGd72JAi428/l4NN3y9W/HkI2Zuppd2l
7ifURkNQllYjGCSoHBviJbajyFBeA1nkFJgMSJiRs4T167K3zTbsyjNnfa4LnsvS
B3SrYMGqIH+oR20R9EoV8prVX+Alj1hh/jX02J8zsCcHmBqF2yZi17NarVAWoarm
v/AAqehlP+D1vjAmbCG9DeborrjaNi+v6zFTKK6ZadvLXRJX/N+wEPIpG4KjiK8W
DWKIVlee0R+kgCXE1n9AuZaZLWb7VwrAjkG1Pmfi3vkZhWeAhOW4X98ehhi/hVR/
Gq+e48ZECW5yuOA1q4hbsCYkGr2qAn/LPbsXxhEmW8qwkJHZYkI=
=5R2w
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc3, including fixes from wireless, can, and
netfilter subtrees.
Current merge window - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous releases - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even if
inner packet has IP_DF (don't fragment) set in its header (when
TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break gso
support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous releases - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need to
be dropped
- always clone the skbs to make sure they have a reference on the
socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200"
* tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
ionic: check port ptr before use
r8169: work around short packet hw bug on RTL8125
net: openvswitch: silence suspicious RCU usage warning
chelsio/chtls: fix always leaking ctrl_skb
chelsio/chtls: fix memory leaks caused by a race
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
...
Remove in-kernel route notifications when the configuration of their
nexthop changes.
These notifications are unnecessary because the route still uses the
same nexthop ID. A separate notification for the nexthop change itself
is now sent in the nexthop notification chain.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When registering a new notifier to the nexthop notification chain,
replay all the existing nexthops to the new notifier so that it will
have a complete picture of the available nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will be used by the next patch which extends the function to replay
all the existing nexthops to the notifier block being registered.
Device drivers will be able to pass extack to the function since it is
passed to them upon reload from devlink.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a single nexthop is deleted, the configuration of all the groups
using the nexthop is effectively modified. In this case, emit a
notification in the nexthop notification chain for each modified group
so that listeners would not need to keep track of which nexthops are
member in which groups.
In the rare cases where the notification fails, emit an error to the
kernel log. This is done by allocating extack on the stack and printing
the error logged by the listener that rejected the notification.
Changes since RFC:
* Allocate extack on the stack
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a single nexthop is replaced, the configuration of all the groups
using the nexthop is effectively modified. In this case, emit a
notification in the nexthop notification chain for each modified group
so that listeners would not need to keep track of which nexthops are
member in which groups.
The notification can only be emitted after the new configuration (i.e.,
'struct nh_info') is pointed at by the old shell (i.e., 'struct
nexthop'). Before that the configuration of the nexthop groups is still
the same as before the replacement.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The notification is emitted after all the validation checks were
performed, but before the new configuration (i.e., 'struct nh_info') is
pointed at by the old shell (i.e., 'struct nexthop'). This prevents the
need to perform rollback in case the notification is vetoed.
The next patch will also emit a replace notification for all the nexthop
groups in which the nexthop is used.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emit a notification in the nexthop notification chain when an existing
nexthop group is replaced.
The notification is emitted after all the validation checks were
performed, but before the new configuration (i.e., 'struct nh_grp') is
pointed at by the old shell (i.e., 'struct nexthop'). This prevents the
need to perform rollback in case the notification is vetoed.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emit a notification in the nexthop notification chain when a new nexthop
is added (not replaced). The nexthop can either be a new group or a
single nexthop.
The notification is sent after the nexthop is inserted into the
red-black tree, as listeners might need to callback into the nexthop
code with the nexthop ID in order to mark the nexthop as offloaded.
A 'REPLACE' notification is emitted instead of 'ADD' as the distinction
between the two is not important for in-kernel listeners. In case the
listener is not familiar with the encoded nexthop ID, it can simply
treat it as a new one. This is also consistent with the route offload
API.
Changes since RFC:
* Reword commit message
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a function that can be called by device drivers to set "offload" or
"trap" indication on nexthops following nexthop notifications.
Changes since RFC:
* s/nexthop_hw_flags_set/nexthop_set_hw_flags/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The flag indicates to user space that the nexthop is not programmed to
forward packets in hardware, but rather to trap them to the CPU. This is
needed, for example, when the MAC of the nexthop neighbour is not
resolved and packets should reach the CPU to trigger neighbour
resolution.
The flag will be used in subsequent patches by netdevsim to test nexthop
objects programming to device drivers and in the future by mlxsw as
well.
Changes since RFC:
* Reword commit message
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert the sole listener of the nexthop notification chain (the VXLAN
driver) to the new notification info.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare the new notification information so that it could be passed to
listeners in the new patch.
Changes since RFC:
* Add a blank line in __nh_notifier_single_info_init()
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The next patch will add extack to the notification info. This allows
listeners to veto notifications and communicate the reason to user space.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some drivers use skb->priority to determine on which queue to send
a frame. An example is mt76x2u (this was tested on an AWUS036ACM).
This means these drivers currently do not adhere to the DONT_REORDER
flag. To fix this, we do not set skb->priority based on the QoS TID
of injected frames when the DONT_REORDER flag is set.
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201104061823.197407-5-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently ieee80211_set_qos_hdr sets the QoS TID of all frames based
on the value assigned to skb->priority. This means it will also
overwrite the QoS TID of injected frames. The commit 753ffad3d6
("mac80211: fix TID field in monitor mode transmit") prevented
injected frames from being modified because of this by setting
skb->priority to the TID of the injected frame, which assured the
QoS TID will not be changed to a different value. Unfortunately,
this workaround complicates the handling of injected frames because
we can't set skb->priority without affecting the TID value in the
QoS field of injected frames.
To avoid this, and to simplify the next patch, detect if a frame is
injected in ieee80211_set_qos_hdr and if so do not change its QoS
field.
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201104061823.197407-4-Mathy.Vanhoef@kuleuven.be
[fix typos in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the Tx control flag is set to prevent frame reordering, send
all frames that have this flag set on the same queue. This assures
that frames that have this flag set are not reordered relative to
other frames that have this flag set.
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201104061823.197407-3-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new radiotap flag to indicate injected frames must not be
reordered relative to other frames that also have this flag set,
independent of priority field values in the transmitted frame.
Parse this radiotap flag and define and set a corresponding Tx
control flag. Note that this flag has recently been standardized
as part of an update to radiotap.
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201104061823.197407-2-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently he_support is set only for AP mode. Storing this
information for mesh BSS as well helps driver to determine
HE support. Also save HE operation element params in BSS
conf so that drivers can access this for any configurations
instead of having to parse the beacon to fetch that info.
Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
Link: https://lore.kernel.org/r/20201020183111.25458-2-pradeepc@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While adding HE MCS beacon tx rate support, it is observed that legacy
beacon tx rate in VHT hwsim test suite is failed. Whenever the
application doesn't explicitly set VHT/MCS rate attribute in fixed rate
command, by default all HE MCS masks are enabled in cfg80211. In beacon
fixed rate, more than one rate mask is not allowed. Fix that by not
setting all rate mask by default in case of beacon tx rate.
Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org>
Link: https://lore.kernel.org/r/1602879327-29488-1-git-send-email-rmanohar@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The for-loop iterates with a u8 loop counter and compares this
with the loop upper limit of request->n_ssids which is an int type.
There is a potential infinite loop if n_ssids is larger than the
u8 loop counter, so fix this by making the loop counter an int.
Addresses-Coverity: ("Infinite loop")
Fixes: c8cb5b854b ("nl80211/cfg80211: support 6 GHz scanning")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201029222407.390218-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers supporting 4096-QAM rates as a vendor extension in HE mode need
to update the correct rate info to userspace while using 4096-QAM (MCS12
and MCS13) in HE mode. Add support to calculate bitrates of HE-MCS12 and
HE-MCS13 which represent the 4096-QAM modulation schemes. The MCS12 and
MCS13 bitrates are defined in IEEE P802.11be/D0.1.
In addition, scale up the bitrates by 3*2048 in order to accommodate
calculations for the new MCS12 and MCS13 rates without losing fraction
values.
Signed-off-by: Vamsi Krishna <vamsin@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20201029183457.7005-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support to configure SAE PWE preference from userspace to drivers in
both AP and STA modes. This is needed for cases where the driver takes
care of Authentication frame processing (SME in the driver) so that
correct enforcement of the acceptable PWE derivation mechanism can be
performed.
The userspace applications can pass the sae_pwe value using the
NL80211_ATTR_SAE_PWE attribute in the NL80211_CMD_CONNECT and
NL80211_CMD_START_AP commands to the driver. This allows selection
between the hunting-and-pecking loop and hash-to-element options for PWE
derivation. For backwards compatibility, this new attribute is optional
and if not included, the driver is notified of the value being
unspecified.
Signed-off-by: Rohan Dutta <drohan@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20201027100910.22283-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Replace commas with semicolons. Commas introduce unnecessary
variability in the code structure and are hard to see. What is done
is essentially described by the following Coccinelle semantic patch
(http://coccinelle.lip6.fr/):
// <smpl>
@@ expression e1,e2; @@
e1
-,
+;
e2
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/1602412498-32025-3-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
inet(6)_skb_parm was removed from sctp_input_cb by Commit a1dd2cf2f1
("sctp: allow changing transport encap_port by peer packets"), as it
thought sctp_input_cb->header is not used any more in SCTP.
syzbot reported a crash:
[ ] BUG: KASAN: use-after-free in decode_session6+0xe7c/0x1580
[ ]
[ ] Call Trace:
[ ] <IRQ>
[ ] dump_stack+0x107/0x163
[ ] kasan_report.cold+0x1f/0x37
[ ] decode_session6+0xe7c/0x1580
[ ] __xfrm_policy_check+0x2fa/0x2850
[ ] sctp_rcv+0x12b0/0x2e30
[ ] sctp6_rcv+0x22/0x40
[ ] ip6_protocol_deliver_rcu+0x2e8/0x1680
[ ] ip6_input_finish+0x7f/0x160
[ ] ip6_input+0x9c/0xd0
[ ] ipv6_rcv+0x28e/0x3c0
It was caused by sctp_input_cb->header/IP6CB(skb) still used in sctp rx
path decode_session6() but some members overwritten by sctp6_rcv().
This patch is to fix it by bring inet(6)_skb_parm back to sctp_input_cb
and not overwriting it in sctp4/6_rcv() and sctp_udp_rcv().
Reported-by: syzbot+5be8aebb1b7dfa90ef31@syzkaller.appspotmail.com
Fixes: a1dd2cf2f1 ("sctp: allow changing transport encap_port by peer packets")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/136c1a7a419341487c504be6d1996928d9d16e02.1604472932.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.
These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
[Kurt: Pass info instead of ptr]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.
It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow user to request action terse dump with new flag value
TCA_FLAG_TERSE_DUMP. Only output essential action info in terse dump (kind,
stats, index and cookie, if set by the user when creating the action). This
is different from filter terse dump where index is excluded (filter can be
identified by its own handle).
Move tcf_action_dump_terse() function to the beginning of source file in
order to call it from tcf_dump_walker().
Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201102201243.287486-1-vlad@buslov.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c
from Jose M. Guisado.
2) Consolidate nft_reject_inet initialization and dump, also from Jose.
3) Add the netdev reject action, from Jose.
4) Allow to combine the exist flag and the destroy command in ipset,
from Joszef Kadlecsik.
5) Expose bucket size parameter for hashtables, also from Jozsef.
6) Expose the init value for reproducible ipset listings, from Jozsef.
7) Use __printf attribute in nft_request_module, from Andrew Lunn.
8) Allow to use reject from the inet ingress chain.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
netfilter: nft_reject_inet: allow to use reject from inet ingress
netfilter: nftables: Add __printf() attribute
netfilter: ipset: Expose the initval hash parameter to userspace
netfilter: ipset: Add bucketsize parameter to all hash types
netfilter: ipset: Support the -exist flag with the destroy command
netfilter: nft_reject: add reject verdict support for netdev
netfilter: nft_reject: unify reject init and dump into nft_reject
netfilter: nf_reject: add reject skbuff creation helpers
====================
Link: https://lore.kernel.org/r/20201104141149.30082-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch added a new sysctl, named add_addr_timeout, to control the
timeout value (in seconds) of the ADD_ADDR retransmission.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_clean_una() will wake writers in case memory could be reclaimed.
When called from mptcp_sendmsg the wakeup code isn't needed.
Move the wakeup to a new helper and then use that from the mptcp worker.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the TCP stack splits a packet on the write queue, the tail
half currently lose the associated skb extensions, and will not
carry the DSM on the wire.
The above does not cause functional problems and is allowed by
the RFC, but interact badly with GRO and RX coalescing, as possible
candidates for aggregation will carry different TCP options.
This change tries to improve the MPTCP behavior, propagating the
skb extensions on split.
Additionally, we must prevent the MPTCP stack from updating the
mapping after the split occur: that will both violate the RFC and
fool the reader.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function is short and won't sleep, so this can use the _fast version.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In addition to tcp autotuning during read, it may also increase the
receive buffer in tcp_clamp_window().
In this case, mptcp should adjust its receive buffer size as well so
it can move all pending skbs from the subflow socket to the mptcp socket.
At this time, TCP can have more skbs ready for processing than what the
mptcp receive buffer size allows.
In the mptcp case, the receive window announced is based on the free
space of the mptcp parent socket instead of the individual subflows.
Following the subflow allows mptcp to grow its receive buffer.
This is especially noticeable for loopback traffic where two skbs are
enough to fill the initial receive window.
In mptcp_data_ready() we do not hold the mptcp socket lock, so modifying
mptcp_sk->sk_rcvbuf is racy. Do it when moving skbs from subflow to
mptcp socket, both sockets are locked in this case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl+hzPwTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqU8YB/9PBALnpZFDSyOE/8rKoBoqA2uPfj2i
Yplu98jkFqhnb5I1KFPCNJiTQd+/aAzM2LzeGHVOBMIF6scPUclC12k1q4fdLtX0
6YMZ38w2I2hq8z1QIgOYo7jQ34NeonNt7T5CHEeBA7xXGnlo/WYDNDE0cruPnPRZ
eFqM5f1/PVKKh4gFVTAqICC2ZMefL4rgAkFgFXj2rfiYr115OEGAwCav5Ys31p/y
MI5SfQmNkfkE8HswMNBDQZ+8V5qkKvarHXwUcRfgUqkpqHQjzcOIJnCDh/ngIh50
imwxHaCerXvEj8MBUcF2fZV7w6QPTFIV3TQ0AiUjuVUE3HPuR+JPvSl4
=B3CC
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-03
The first two patches are by Oleksij Rempel and they add a generic
can-controller Device Tree yaml binding and convert the text based binding
of the flexcan driver to a yaml based binding.
Zhang Changzhong's patch fixes a remove_proc_entry warning in the AF_CAN
core.
A patch by me fixes a kfree_skb() call from IRQ context in the rx-offload
helper.
Vincent Mailhol contributes a patch to prevent a call to kfree_skb() in
hard IRQ context in can_get_echo_skb().
Oliver Hartkopp's patch fixes the length calculation for RTR CAN frames
in the __can_get_echo_skb() helper.
Oleksij Rempel's patch fixes a use-after-free that shows up with j1939 in
can_create_echo_skb().
Yegor Yefremov contributes 4 patches to enhance the j1939 documentation.
Zhang Changzhong's patch fixes a hanging task problem in j1939_sk_bind()
if the netdev is down.
Then there are three patches for the newly added CAN_ISOTP protocol. Geert
Uytterhoeven enhances the kconfig help text. Oliver Hartkopp's patch adds
missing RX timeout handling in listen-only mode and Colin Ian King's patch
decreases the generated object code by 926 bytes.
Zhang Changzhong contributes a patch for the ti_hecc driver that fixes the
error path in the probe function.
Navid Emamdoost's patch for the xilinx_can driver fixes the error handling
in case of failing pm_runtime_get_sync().
There are two patches for the peak_usb driver. Dan Carpenter adds range
checking in decode operations and Stephane Grosjean's patch fixes
a timestamp wrapping problem.
Stephane Grosjean's patch for th peak_canfd driver fixes echo management if
loopback is on.
The next three patches all target the mcp251xfd driver. The first one is
by me and it increased the severity of CRC read error messages. The kernel
test robot removes an unneeded semicolon and Tom Rix removes unneeded
break in several switch-cases.
The last 4 patches are by Joakim Zhang and target the flexcan driver,
the first three fix ECC related device specific quirks for the LS1021A,
LX2160A and the VF610 SoC. The last patch disable wakeup completely upon
driver remove.
* tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: (27 commits)
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
can: j1939: j1939_sk_bind(): return failure if netdev is down
can: j1939: use backquotes for code samples
can: j1939: swap addr and pgn in the send example
can: j1939: fix syntax and spelling
can: j1939: rename jacd tool
...
====================
Link: https://lore.kernel.org/r/<20201103220636.972106-1-mkl@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
1) Fix packet receiving of standard IP tunnels when the xfrm_interface
module is installed. From Xin Long.
2) Fix a race condition between spi allocating and hash list
resizing. From zhuoliang zhang.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable err is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201102121615.695196-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the const array plen on the stack but instead it static. Makes
the object code smaller by 926 bytes.
Before:
text data bss dec hex filename
26531 1943 64 28538 6f7a net/can/isotp.o
After:
text data bss dec hex filename
25509 2039 64 27612 6bdc net/can/isotp.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201020154203.54711-1-colin.king@canonical.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
As reported by Thomas Wagner:
https://github.com/hartkopp/can-isotp/issues/34
the timeout handling for data frames is not enabled when the isotp socket is
used in listen-only mode (sockopt CAN_ISOTP_LISTEN_MODE). This mode is enabled
by the isotpsniffer application which therefore became inconsistend with the
strict rx timeout rules when running the isotp protocol in the operational
mode.
This patch fixes this inconsistency by moving the return condition for the
listen-only mode behind the timeout handling code.
Reported-by: Thomas Wagner <thwa1@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: e057dd3fc2 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/hartkopp/can-isotp/issues/34
Link: https://lore.kernel.org/r/20201019120229.89326-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The help text for the CAN_ISOTP config symbol uses the acronym "PDU". However,
this acronym is not explained here, nor in Documentation/networking/can.rst.
Expand the acronym to make it easier for users to decide if they need to enable
the CAN_ISOTP option or not.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201013141341.28487-1-geert+renesas@glider.be
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When a netdev down event occurs after a successful call to
j1939_sk_bind(), j1939_netdev_notify() can handle it correctly.
But if the netdev already in down state before calling j1939_sk_bind(),
j1939_sk_release() will stay in wait_event_interruptible() blocked
forever. Because in this case, j1939_netdev_notify() won't be called and
j1939_tp_txtimer() won't call j1939_session_cancel() or other function
to clear session for ENETDOWN error, this lead to mismatch of
j1939_session_get/put() and jsk->skb_pending will never decrease to
zero.
To reproduce it use following commands:
1. ip link add dev vcan0 type vcan
2. j1939acd -r 100,80-120 1122334455667788 vcan0
3. presses ctrl-c and thread will be blocked forever
This patch adds check for ndev->flags in j1939_sk_bind() to avoid this
kind of situation and return with -ENETDOWN.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1599460308-18770-1-git-send-email-zhangchangzhong@huawei.com
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If can_init_proc() fail to create /proc/net/can directory, can_remove_proc()
will trigger a warning:
WARNING: CPU: 6 PID: 7133 at fs/proc/generic.c:672 remove_proc_entry+0x17b0
Kernel panic - not syncing: panic_on_warn set ...
Fix to return early from can_remove_proc() if can proc_dir does not exists.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1594709090-3203-1-git-send-email-zhangchangzhong@huawei.com
Fixes: 8e8cda6d73 ("can: initial support for network namespaces")
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") added a test in
dst_neigh_lookup_skb() to avoid a NULL pointer dereference. The root
cause was the MPLS forwarding code, which doesn't call skb_dst_drop()
on incoming packets. That is, if the packet is received from a
collect_md device, it has a metadata_dst attached to it that doesn't
implement any dst_ops function.
To align the MPLS behaviour with IPv4 and IPv6, let's drop the dst in
mpls_forward(). This way, dst_neigh_lookup_skb() doesn't need to test
->neigh_lookup any more. Let's keep a WARN condition though, to
document the precondition and to ease detection of such problems in the
future.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f8c2784c13faa54469a2aac339470b1049ca6b63.1604102750.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add KCOV remote annotations to ieee80211_iface_work() and
ieee80211_rx_list(). This will enable coverage-guided fuzzing of
mac80211 code that processes incoming 802.11 frames.
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remote KCOV coverage collection enables coverage-guided fuzzing of the
code that is not reachable during normal system call execution. It is
especially helpful for fuzzing networking subsystems, where it is
common to perform packet handling in separate work queues even for the
packets that originated directly from the user space.
Enable coverage-guided frame injection by adding kcov remote handle to
skb extensions. Default initialization in __alloc_skb and
__build_skb_around ensures that no socket buffer that was generated
during a system call will be missed.
Code that is of interest and that performs packet processing should be
annotated with kcov_remote_start()/kcov_remote_stop().
An alternative approach is to determine kcov_handle solely on the
basis of the device/interface that received the specific socket
buffer. However, in this case it would be impossible to distinguish
between packets that originated during normal background network
processes or were intentionally injected from the user space.
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: Per Forlin <per.forlin@axis.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX.
Because DSA is unaware of asymmetrical tag lengths, the overhead/needed
headroom is declared as 8 bytes and therefore 4 bytes larger than it
needs to be. If this becomes a problem, and the GSWIP driver can't be
converted to a uniform header length, we might need to make DSA aware of
separate RX/TX overhead values.
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Similar to the EtherType DSA tagger, the old Marvell tagger can
transform an 802.1Q header if present into a DSA tag, so there is no
headroom required in that case. But we are ensuring that it exists,
regardless (practically speaking, the headroom must be 4 bytes larger
than it needs to be).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Note that the VLAN code path needs a smaller extra headroom than the
regular EtherType DSA path. That isn't a problem, because this tagger
declares the larger tag length (8 bytes vs 4) as the protocol overhead,
so we are covered in both cases.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the moment, taggers are left with the task of ensuring that the skb
headers are writable (which they aren't, if the frames were cloned for
TX timestamping, for flooding by the bridge, etc), and that there is
enough space in the skb data area for the DSA tag to be pushed.
Moreover, the life of tail taggers is even harder, because they need to
ensure that short frames have enough padding, a problem that normal
taggers don't have.
The principle of the DSA framework is that everything except for the
most intimate hardware specifics (like in this case, the actual packing
of the DSA tag bits) should be done inside the core, to avoid having
code paths that are very rarely tested.
So provide a TX reallocation procedure that should cover the known needs
of DSA today.
Note that this patch also gives the network stack a good hint about the
headroom/tailroom it's going to need. Up till now it wasn't doing that.
So the reallocation procedure should really be there only for the
exceptional cases, and for cloned packets which need to be unshared.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix smatch warning:
net/openvswitch/meter.c:427 ovs_meter_cmd_set() warn: passing zero to 'PTR_ERR'
dp_meter_create() never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20201031060153.39912-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During TCP fast recovery, the congestion control in charge is by
default the Proportional Rate Reduction (PRR) unless the congestion
control module specified otherwise (e.g. BBR).
Previously when tcp_packets_in_flight() is below snd_ssthresh PRR
would slow start upon receiving an ACK that
1) cumulatively acknowledges retransmitted data
and
2) does not detect further lost retransmission
Such conditions indicate the repair is in good steady progress
after the first round trip of recovery. Otherwise PRR adopts the
packet conservation principle to send only the amount that was
newly delivered (indicated by this ACK).
This patch generalizes the previous design principle to include
also the newly sent data beside retransmission: as long as
the delivery is making good progress, both retransmission and
new data should be accounted to make PRR more cautious in slow
starting.
Suggested-by: Matt Mathis <mattmathis@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201031013412.1973112-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/ipv6/addrconf.c:2005: warning: Function parameter or member 'dev' not described in 'ipv6_dev_find'
net/ipv6/ip6_vti.c:138: warning: Function parameter or member 'ip6n' not described in 'vti6_tnl_bucket'
net/ipv6/ip6_tunnel.c:218: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_bucket'
net/ipv6/ip6_tunnel.c:238: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_link'
net/ipv6/ip6_tunnel.c:254: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_unlink'
net/ipv6/ip6_tunnel.c:427: warning: Function parameter or member 'raw' not described in 'ip6_tnl_parse_tlv_enc_lim'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'skb' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'ipproto' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'opt' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'type' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'code' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'msg' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'info' not described in 'ip6_tnl_err'
net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'offset' not described in 'ip6_tnl_err'
ip6_tnl_err() is an internal function, so remove the kerneldoc. For
the others, add the missing parameters.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201031183044.1082193-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 978aa04741 ("sctp: fix some type cast warnings introduced since
very beginning")' broke err reading from sctp_arg, because it reads the
value as 32-bit integer, although the value is stored as 16-bit integer.
Later this value is passed to the userspace in 16-bit variable, thus the
user always gets 0 on big-endian platforms. Fix it by reading the __u16
field of sctp_arg union, as reading err field would produce a sparse
warning.
Fixes: 978aa04741 ("sctp: fix some type cast warnings introduced since very beginning")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20201030132633.7045-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/9p/client.c:420: warning: Function parameter or member 'c' not described in 'p9_client_cb'
net/9p/client.c:420: warning: Function parameter or member 'req' not described in 'p9_client_cb'
net/9p/client.c:420: warning: Function parameter or member 'status' not described in 'p9_client_cb'
net/9p/client.c:568: warning: Function parameter or member 'uidata' not described in 'p9_check_zc_errors'
net/9p/trans_common.c:23: warning: Function parameter or member 'nr_pages' not described in 'p9_release_pages'
net/9p/trans_common.c:23: warning: Function parameter or member 'pages' not described in 'p9_release_pages'
net/9p/trans_fd.c:132: warning: Function parameter or member 'rreq' not described in 'p9_conn'
net/9p/trans_fd.c:132: warning: Function parameter or member 'wreq' not described in 'p9_conn'
net/9p/trans_fd.c:56: warning: Function parameter or member 'privport' not described in 'p9_fd_opts'
net/9p/trans_rdma.c:113: warning: Function parameter or member 'cqe' not described in 'p9_rdma_context'
net/9p/trans_rdma.c:129: warning: Function parameter or member 'privport' not described in 'p9_rdma_opts'
net/9p/trans_virtio.c:215: warning: Function parameter or member 'limit' not described in 'pack_sg_list_p'
net/9p/trans_virtio.c:83: warning: Function parameter or member 'chan_list' not described in 'virtio_chan'
net/9p/trans_virtio.c:83: warning: Function parameter or member 'p9_max_pages' not described in 'virtio_chan'
net/9p/trans_virtio.c:83: warning: Function parameter or member 'ring_bufs_avail' not described in 'virtio_chan'
net/9p/trans_virtio.c:83: warning: Function parameter or member 'tag' not described in 'virtio_chan'
net/9p/trans_virtio.c:83: warning: Function parameter or member 'vc_wq' not described in 'virtio_chan'
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Dominique Martinet <asmadeus@codewreck.org>
Link: https://lore.kernel.org/r/20201031182655.1082065-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl+b4B0ACgkQB8qZga/f
l8SdOg/+PcKqoNXh+VgP2ZvCcN3D/3ow5mf9OgNijPoM35a4sJZdGpRWp1oUXK8O
1NbyE/mL3TiNS5HIb+jF03+pHOl+ysh0AE3QsQeZwn+bHkU61T9J477NQr4Y9hQ0
eJZxDjgSUsJhx1xnGPH0QFUi7zaFQAfy1Q5AVCPP5ywEOovTuOY9Qw/7D2EZoh5L
k2H1kjLIle2VCckqrL5pno3dz1lAGRZ5RGGiP8/ATfBH6pYON9yFSflc9x6azTS2
vWyfZxzbrFWvT2YFMwNUnNl4oNjLIvGYmYzULp9MweyFA6lfZpIOAcGtVoM98nD7
wu6KWeozo4c+3D25kqxRlpE6fILJ+uiCKcHV+7GyLsDkp9s2onE5f/UHmcP+pGkh
QE/ubTbe2brWSpPwHyAXEg1FQ3WPmJj90Tr3OA2j+rIfh/+eUH/inoxvddqyOamR
mBr8M1VRY8+PRAru9UKU+EG4CueX5GALxbOH8rJFtlDsJz33CxGm1sxpAc9pR4CX
XYuaPYsunok7/tXRxXulaCE0B6DOqyhX8L7drVJ0nEpAv7J3WfkZxlFs8+3VPBG7
BMiwNqNMKHM613a16vvl8ZOsguzRPIxhPLWVbcUb5c6NQq+4FOn5pZYtZU8jZ19e
w3iDPbfaCCARnf5U5JKqTt45q+71rbU9pSncMjy7r5/r1hPGwiQ=
=2AGz
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-net-2020-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of fixes, for
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enhance validation to support for reject from inet ingress chains.
Note that, reject from inet ingress and netdev ingress differ.
Reject packets from inet ingress are sent through ip_local_out() since
inet reject emulates the IP layer receive path. So the reject packet
follows to classic IP output and postrouting paths.
The reject action from netdev ingress assumes the packet not yet entered
the IP layer, so the reject packet is sent through dev_queue_xmit().
Therefore, reject packets from netdev ingress do not follow the classic
IP output and postrouting paths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft_request_module calls vsnprintf() using parameters passed to it.
Make the function with __printf() attribute so the compiler can check
the format and arguments.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Incorrect netlink report logic in flowtable and genID.
2) Add a selftest to check that wireguard passes the right sk
to ip_route_me_harder, from Jason A. Donenfeld.
3) Pass the actual sk to ip_route_me_harder(), also from Jason.
4) Missing expression validation of updates via nft --check.
5) Update byte and packet counters regardless of whether they
match, from Stefano Brivio.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit cited below has changed only the functional prototype of
br_multicast_querier_exists, but forgot to do that for the stub
prototype (the one where CONFIG_BRIDGE_IGMP_SNOOPING is disabled).
Fixes: 955062b03f ("net: bridge: mcast: add support for raw L2 multicast groups")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201101000845.190009-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit 962924fa2b ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")
When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.
Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To allow better problem diagnosis the return codes for SMC-Dv2 are
improved by this patch. A few more CLC DECLINE codes are defined and
sent to the peer when an SMC connection cannot be established.
There are now multiple SMC variations that are offered by the client and
the server may encounter problems to initialize all of them.
Because only one diagnosis code can be sent to the client the decision
was made to send the first code that was encountered. Because the server
tries the variations in the order of importance (SMC-Dv2, SMC-D, SMC-R)
this makes sure that the diagnosis code of the most important variation
is sent.
v2: initialize rc in smc_listen_v2_check().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20201031181938.69903-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hi Linus,
Please, pull the following patches that replace zero-length arrays with
flexible-array members.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl+cjRUACgkQRwW0y0cG
2zGWAhAAjUfTsAmXWhKNaWFSCYR0Q822puTUWOKfiBd+jjGaO04luTtr2gjv2Dkb
Vgad8H4N8oZU79xfh5JZ5PUyScaso8wE6ZJTh2PLKXpKmNd213f5x/pIt78CCDTa
Y1L/eR41mmveTL3VNS3sf6WaZpT9owxJKGIY8JgdiOmSjxJQpX5zdaC1KYso4eXr
lIXIRo9VLEmVLhhHhZi+QmX6+aQ05E1D9K0ENe4/uEnRsV525W78iwZ4fYeLzr+A
krEOdgx6sPgzajPYnHoayrrcKNKxD5YY1SWuVSm2tqYYIhlRoK3f5xgLOd10RiHE
YMgx8aWzGmGJwoUhgp1bo/l9EZ7O8OWRqM/GOP4x6Wgjdhqw2x5jgskmhsKNGEXu
/BlbS+qL5aUrMCxhvNbApuZW6xBiBbva76MH3vU9vFhZbVz1CHLQdGI0tfxggYWS
jc2UPgoxL9OQlf3jSc+gK7RMFhBGNWn2Aiy8GQas3BxPYXuYPvwOj+irDOG/qZ9D
VZ5swUw4+th+DsF5K53mEFeLv0fONMgL9Ka5bNR6+k6HG0WNLYYVOiet3xYUDo1f
eZbMZthfc+QW7R8cwG0WuFk6rC6mLqE+A9nQuLZoJD+VMuJd4pwW9+6EW8nDX08w
FS4/o92xUFJfOCgaLRS61FSAuSmFENieN+yoKMK/Uf6PJVdNMb4=
=vyu3
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more flexible-array member conversions from Gustavo A. R. Silva:
"Replace zero-length arrays with flexible-array members"
* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
printk: ringbuffer: Replace zero-length array with flexible-array member
net/smc: Replace zero-length array with flexible-array member
net/mlx5: Replace zero-length array with flexible-array member
mei: hw: Replace zero-length array with flexible-array member
gve: Replace zero-length array with flexible-array member
Bluetooth: btintel: Replace zero-length array with flexible-array member
scsi: target: tcmu: Replace zero-length array with flexible-array member
ima: Replace zero-length array with flexible-array member
enetc: Replace zero-length array with flexible-array member
fs: Replace zero-length array with flexible-array member
Bluetooth: Replace zero-length array with flexible-array member
params: Replace zero-length array with flexible-array member
tracepoint: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
Based on RFC 8200, Section 4.5 Fragment Header:
- If the first fragment does not include all headers through an
Upper-Layer header, then that fragment should be discarded and
an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero.
Checking each packet header in IPv6 fast path will have performance impact,
so I put the checking in ipv6_frag_rcv().
As the packet may be any kind of L4 protocol, I only checked some common
protocols' header length and handle others by (offset + 1) > skb->len.
Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments
(fragmented packet with only one fragment).
When send ICMP error message, if the 1st truncated fragment is ICMP message,
icmp6_send() will break as is_ineligible() return true. So I added a check
in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header
return false.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The position index in leq_seq_next is not updated when the next
entry is fetched an no more entries are available. This causes
seq_file to report the following error:
"seq_file: buggy .next function lec_seq_next [lec] did not update
position index"
Fix this by always updating the position index.
[ Note: this is an ancient 2002 bug, the sha is from the
tglx/history repo ]
Fixes 4aea2cbff417 ("[ATM]: Move lan seq_file ops to lec.c [1/3]")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It makes possible to reproduce exactly the same set after a save/restore.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The parameter defines the upper limit in any hash bucket at adding new entries
from userspace - if the limit would be exceeded, ipset doubles the hash size
and rehashes. It means the set may consume more memory but gives faster
evaluation at matching in the set.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>