To free the skb in normal course of processing, consume_skb() should be
used. Only for failure paths, skb_free() is intended to be used.
https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.html
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb free-ed in:
1/ condition 1: tipc_sk_filter_rcv -> tipc_sk_proto_rcv
2/ condition 2: tipc_sk_filter_rcv -> tipc_group_filter_msg
This leads to a "use-after-free" access in the next condition.
We fix this by intializing the variable at declaration, then it is safe
to check this variable to continue processing if condition matches.
syzbot report:
==================================================================
BUG: KASAN: use-after-free in tipc_sk_filter_rcv+0x2166/0x34f0
net/tipc/socket.c:2167
Read of size 4 at addr ffff88808ea58534 by task kworker/u4:0/7
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.0.0+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: tipc_send tipc_conn_send_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
tipc_sk_filter_rcv+0x2166/0x34f0 net/tipc/socket.c:2167
tipc_sk_enqueue net/tipc/socket.c:2254 [inline]
tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305
tipc_topsrv_kern_evt+0x3b7/0x580 net/tipc/topsrv.c:610
tipc_conn_send_to_sock+0x43e/0x5f0 net/tipc/topsrv.c:283
tipc_conn_send_work+0x65/0x80 net/tipc/topsrv.c:303
process_one_work+0x98e/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Reported-by: syzbot+e863893591cc7a622e40@syzkaller.appspotmail.com
Fixes: c55c8eda ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to icmp_echo_ignore_multicast, there is a need to also
prevent responding to pings to anycast addresses for security.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.
We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.
Tested with ixgbe and CONFIG_FCOE=m
With pktgen using queue xmit:
threads vanilla patched
(kpps) (kpps)
1 2334 2428
2 4166 4278
4 7895 8100
v1 -> v2:
- rebased after helper's name change
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently packet_pick_tx_queue() is the only caller of
ndo_select_queue() using a fallback argument other than
netdev_pick_tx.
Leveraging rx queue, we can obtain a similar queue selection
behavior using core helpers. After this change, ndo_select_queue()
is always invoked with netdev_pick_tx() as fallback.
We can change ndo_select_queue() signature in a followup patch,
dropping an indirect call per transmitted packet in some scenarios
(e.g. TCP syn and XDP generic xmit)
This changes slightly how af packet queue selection happens when
PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
tacking in account both XPS and TC mapping.
v1 -> v2:
- rebased after helper name change
RFC -> v1:
- initialize sender_cpu to the expected value
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.
Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.
Suggested-by: Eric Dumazet <edumazet@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for AES128-CCM based record encryption. AES128-CCM is
similar to AES128-GCM. Both of them have same salt/iv/mac size. The
notable difference between the two is that while invoking AES128-CCM
operation, the salt||nonce (which is passed as IV) has to be prefixed
with a hardcoded value '2'. Further, CCM implementation in kernel
requires IV passed in crypto_aead_request() to be full '16' bytes.
Therefore, the record structure 'struct tls_rec' has been modified to
reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 has icmp_echo_ignore_broadcast to prevent responding to broadcast pings.
IPv6 needs a similar mechanism.
v1->v2:
- Remove NET_IPV6_ICMP_ECHO_IGNORE_MULTICAST.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the request socket is created locally, it'd make more sense to
use reqsk_free() instead of reqsk_put() in TFO and syncookies' error
path.
However, tcp_get_cookie_sock() may set ->rsk_refcnt before freeing the
socket; tcp_conn_request() may also have non-null ->rsk_refcnt because
of tcp_try_fastopen(). In both cases 'req' hasn't been exposed
to the outside world and is safe to free immediately, but that'd
trigger the WARN_ON_ONCE in reqsk_free().
Define __reqsk_free() for these situations where we know nobody's
referencing the socket, even though ->rsk_refcnt might be non-null.
Now we can consolidate the error path of tcp_get_cookie_sock() and
tcp_conn_request().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warning:
net/core/datagram.c:411:5: warning:
symbol '__skb_datagram_iter' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP ipv6 fast path dereferences a pointer to get to the inet6
part of a tcp socket, but given the fixed memory placement,
we can do better and avoid a possible cache line miss.
This also reduces register pressure, since we let the compiler
know about this memory placement.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a multicast stream may start out using replicast, because
there are few destinations, and then it should ideally switch to
L2/broadcast IGMP/multicast when the number of destinations grows beyond
a certain limit. The opposite should happen when the number decreases
below the limit.
To eliminate the risk of message reordering caused by method change,
a sending socket must stick to a previously selected method until it
enters an idle period of 5 seconds. Means there is a 5 seconds pause
in the traffic from the sender socket.
If the sender never makes such a pause, the method will never change,
and transmission may become very inefficient as the cluster grows.
With this commit, we allow such a switch between replicast and
broadcast without any need for a traffic pause.
Solution is to send a dummy message with only the header, also with
the SYN bit set, via broadcast or replicast. For the data message,
the SYN bit is set and sending via replicast or broadcast (inverse
method with dummy).
Then, at receiving side any messages follow first SYN bit message
(data or dummy message), they will be held in deferred queue until
another pair (dummy or data message) arrived in other link.
v2: reverse christmas tree declaration
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation for introducing a smooth switching between replicast
and broadcast method for multicast message, We have to introduce a new
capability flag TIPC_MCAST_RBCTL to handle this new feature.
During a cluster upgrade a node can come back with this new capabilities
which also must be reflected in the cluster capabilities field.
The new feature is only applicable if all node in the cluster supports
this new capability.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a multicast stream uses either broadcast or replicast as
transmission method, based on the ratio between number of actual
destinations nodes and cluster size.
However, when an L2 interface (e.g., VXLAN) provides pseudo
broadcast support, this becomes very inefficient, as it blindly
replicates multicast packets to all cluster/subnet nodes,
irrespective of whether they host actual target sockets or not.
The TIPC multicast algorithm is able to distinguish real destination
nodes from other nodes, and hence provides a smarter and more
efficient method for transferring multicast messages than
pseudo broadcast can do.
Because of this, we now make it possible for users to force
the broadcast link to permanently switch to using replicast,
irrespective of which capabilities the bearer provides,
or pretend to provide.
Conversely, we also make it possible to force the broadcast link
to always use true broadcast. While maybe less useful in
deployed systems, this may at least be useful for testing the
broadcast algorithm in small clusters.
We retain the current AUTOSELECT ability, i.e., to let the broadcast link
automatically select which algorithm to use, and to switch back and forth
between broadcast and replicast as the ratio between destination
node number and cluster size changes. This remains the default method.
Furthermore, we make it possible to configure the threshold ratio for
such switches. The default ratio is now set to 10%, down from 25% in the
earlier implementation.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_STREAM_SCHEDULER sockopt.
Fixes: 7efba10d6b ("sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_EVENT sockopt.
Fixes: d251f05e3b ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_ENABLE_STREAM_RESET sockopt.
Fixes: 99a62135e1 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_PRINFO sockopt.
Fixes: 3a583059d1 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DEACTIVATE_KEY sockopt.
Fixes: 2af66ff3ed ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DELETE_KEY sockopt.
Fixes: 3adcc30060 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_ACTIVE_KEY sockopt.
Fixes: bf9fb6ad4f ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_KEY sockopt.
Fixes: 7fb3be13a2 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_MAX_BURST sockopt.
Fixes: e0651a0dc8 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_CONTEXT sockopt.
Fixes: 49b037acca ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_SNDINFO sockopt.
Fixes: 92fc3bd928 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DELAYED_SACK sockopt.
Fixes: 9c5829e1c4 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the user pass an invalid asoc_id to SCTP_DEFAULT_SEND_PARAM
on a TCP-style socket, it will silently ignore the new parameters.
That's because after not finding an asoc, it is checking asoc_id against
the known values of CURRENT/FUTURE/ALL values and that fails to match.
IOW, if the user supplies an invalid asoc id or not, it should either
match the current asoc or the socket itself so that it will inherit
these later. Fixes it by forcing asoc_id to SCTP_FUTURE_ASSOC in case it
is a TCP-style socket without an asoc, so that the values get set on the
socket.
Fixes: 707e45b3dc ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp_copy_descendant() copies pd_lobby from old sctp scok to new
sctp sock. If sctp_sock_migrate() returns error, it will panic when
releasing new sock and trying to purge pd_lobby due to the incorrect
pointers in pd_lobby.
[ 120.485116] kasan: CONFIG_KASAN_INLINE enabled
[ 120.486270] kasan: GPF could be caused by NULL-ptr deref or user
[ 120.509901] Call Trace:
[ 120.510443] sctp_ulpevent_free+0x1e8/0x490 [sctp]
[ 120.511438] sctp_queue_purge_ulpevents+0x97/0xe0 [sctp]
[ 120.512535] sctp_close+0x13a/0x700 [sctp]
[ 120.517483] inet_release+0xdc/0x1c0
[ 120.518215] __sock_release+0x1d2/0x2a0
[ 120.519025] sctp_do_peeloff+0x30f/0x3c0 [sctp]
We fix it by not copying sctp_sock pd_lobby in sctp_copy_descendan(),
and skb_queue_head_init() can also be removed in sctp_sock_migrate().
Reported-by: syzbot+85e0b422ff140b03672a@syzkaller.appspotmail.com
Fixes: 89664c6236 ("sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() fails")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I am using "protocol ip" filters in TC to manipulate TC flower
classifiers, which are only available with "protocol ip". However,
I faced an issue that packets sent via raw sockets with ETH_P_ALL
did not match the ip filters even if they did satisfy the condition
(e.g., DHCP offer from dhcpd).
I have determined that the behavior was caused by an unexpected
value stored in skb->protocol, namely, ETH_P_ALL instead of ETH_P_IP,
when packets were sent via raw sockets with ETH_P_ALL set.
IMHO, storing ETH_P_ALL in skb->protocol is not appropriate for
packets sent via raw sockets because ETH_P_ALL is not a real ether
type used on wire, but a virtual one.
This patch fixes the tx protocol selection in cases of transmission
via raw sockets created with ETH_P_ALL so that it asks the driver to
extract protocol from the Ethernet header.
Fixes: 75c65772c3 ("net/packet: Ask driver for protocol if not provided by user")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.
The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.
However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.
This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.
In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.
This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.
Note that this changes the order in which sockets are listed in /proc and
with sock_diag.
Fixes: dc99f60069 ("packet: Add fanout support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move the check that prevents connecting service ranges to after
the RDM/DGRAM check, and move address sanity control to a separate
function that also validates the service range.
Fixes: 23998835be ("tipc: improve address sanity check in tipc_connect()")
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start may fail. The fix check its status and returns
-EMSGSIZE in case it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.
2) Fix BTF to properly resolve forward-declared enums into their corresponding
full enum definition types during deduplication, from Andrii.
3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.
4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.
5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.
6) Various fixes in BPF helper function documentation in bpf.h UAPI header
used to bpf-helpers(7) man page, from Quentin.
7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start could fail and requires a check. The fix returns
-EMSGSIZE if it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start may fail and thus deserves a check.
The fix returns -EMSGSIZE in case it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start may fail and thus deserves a check.
The fix returns -EMSGSIZE when it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
upcall is dereferenced even when genlmsg_put fails. The fix
goto out to avoid the NULL pointer dereference in this case.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the umem is cleaned up, the task that created it might already be
gone. If the task was gone, the xdp_umem_release function did not free
the pages member of struct xdp_umem.
It turned out that the task lookup was not needed at all; The code was
a left-over when we moved from task accounting to user accounting [1].
This patch fixes the memory leak by removing the task lookup logic
completely.
[1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/
Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adds missing sphinx documentation to the
socket.c's functions. Also fixes some whitespaces.
I also changed the style of older documentation as an
effort to have an uniform documentation style.
Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case create_singlethread_workqueue fails, the check returns
an error to callers to avoid potential NULL pointer dereferences.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
We initially interpreted the fwmark parameter as a flag that simply turned
on the feature, using the whole skb->mark field as the index into the CAKE
tin_order array. However, it is quite common for different applications to
use different parts of the mask field for their own purposes, each using a
different mask.
Support this use of subsets of the mark by interpreting the TCA_CAKE_FWMARK
parameter as a bitmask to apply to the fwmark field when reading it. The
result will be right-shifted by the number of unset lower bits of the mask
before looking up the tin.
In the original commit message we also failed to credit Felix Resch with
originally suggesting the fwmark feature back in 2017; so the Suggested-By
in this commit covers the whole fwmark feature.
Fixes: 0b5c7efdfc ("sch_cake: Permit use of connmarks as tin classifiers")
Suggested-by: Felix Resch <fuller@beif.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_snap_client may return NULL, all the callers
check it, but only print a warning. This will result in
NULL pointer dereference in unregister_snap_client and other
places.
It has always been used like this since v2.6
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"More fixes in the queue:
1) Netfilter nat can erroneously register the device notifier twice,
fix from Florian Westphal.
2) Use after free in nf_tables, from Pablo Neira Ayuso.
3) Parallel update of steering rule fix in mlx5 river, from Eli
Britstein.
4) RX processing panic in lan743x, fix from Bryan Whitehead.
5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch.
6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein.
7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing
modes, from Bryan Whitehead.
8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
pptp: dst_release sk_dst_cache in pptp_sock_destruct
MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list
l2tp: fix infoleak in l2tp_ip6_recvmsg()
net/tls: Inform user space about send buffer availability
net_sched: return correct value for *notify* functions
lan743x: Fix TX Stall Issue
net/mlx4_core: Fix qp mtt size calculation
net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling
net/mlx4_core: Fix reset flow when in command polling mode
mlxsw: minimal: Initialize base_mac
mlxsw: core: Prevent duplication during QSFP module initialization
net: dwmac-sun8i: fix a missing check of of_get_phy_mode
net: sh_eth: fix a missing check of of_get_phy_mode
net: 8390: fix potential NULL pointer dereferences
net: fujitsu: fix a potential NULL pointer dereference
net: qlogic: fix a potential NULL pointer dereference
isdn: hfcpci: fix potential NULL pointer dereference
Documentation: devicetree: add a new optional property for port mac address
net: rocker: fix a potential NULL pointer dereference
net: qlge: fix a potential NULL pointer dereference
...
A previous fix ("tls: Fix write space handling") assumed that
user space application gets informed about the socket send buffer
availability when tls_push_sg() gets called. Inside tls_push_sg(), in
case do_tcp_sendpages() returns 0, the function returns without calling
ctx->sk_write_space. Further, the new function tls_sw_write_space()
did not invoke ctx->sk_write_space. This leads to situation that user
space application encounters a lockup always waiting for socket send
buffer to become available.
Rather than call ctx->sk_write_space from tls_push_sg(), it should be
called from tls_write_space. So whenever tcp stack invokes
sk->sk_write_space after freeing socket send buffer, we always declare
the same to user space by the way of invoking ctx->sk_write_space.
Fixes: 7463d3a2db ("tls: Fix write space handling")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is confusing to directly use return value of netlink_send()/
netlink_unicast() as the return value of *notify*, as it may be not
error at all.
Example: in tc_del_tfilter(), after calling tfilter_del_notify(), it will
goto errout if (err). However, the netlink_send()/netlink_unicast() will
return positive value even for successful case. So it may not call
tcf_chain_tp_remove() and so on to clean up the resource, as a result,
resource is leaked.
It may be easier to only check the return value of tfilter_del_nofiy(),
but it is more clean to correct all related functions.
Co-developed-by: Zengmo Gao <gaozengmo@jd.com>
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new helper "struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)"
which returns a bpf_sock in TCP_LISTEN state. It will trace back to
the listener sk from a request_sock if possible. It returns NULL
for all other cases.
No reference is taken because the helper ensures the sk is
in SOCK_RCU_FREE (where the TCP_LISTEN sock should be in).
Hence, bpf_sk_release() is unnecessary and the verifier does not
allow bpf_sk_release(listen_sk) to be called either.
The following is also allowed because the bpf_prog is run under
rcu_read_lock():
sk = bpf_sk_lookup_tcp();
/* if (!sk) { ... } */
listen_sk = bpf_get_listener_sock(sk);
/* if (!listen_sk) { ... } */
bpf_sk_release(sk);
src_port = listen_sk->src_port; /* Allowed */
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>