The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: d862e54614 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tcp sends a TSO packet, adding a PSH flag on it
reduces the sojourn time of GRO packet in GRO receivers.
This is particularly the case under pressure, since RX queues
receive packets for many concurrent flows.
A sender can give a hint to GRO engines when it is
appropriate to flush a super-packet, especially when pacing
is in the picture, since next packet is probably delayed by
one ms.
Having less packets in GRO engine reduces chance
of LRU eviction or inflated RTT, and reduces GRO cost.
We found recently that we must not set the PSH flag on
individual full-size MSS segments [1] :
Under pressure (CWR state), we better let the packet sit
for a small delay (depending on NAPI logic) so that the
ACK packet is delayed, and thus next packet we send is
also delayed a bit. Eventually the bottleneck queue can
be drained. DCTCP flows with CWND=1 have demonstrated
the issue.
This patch allows to slowdown the aggregate traffic without
involving high resolution timers on senders and/or
receivers.
It has been used at Google for about four years,
and has been discussed at various networking conferences.
[1] segments smaller than MSS already have PSH flag set
by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
has been requested by the user.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.
Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.
The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone. After this fix, the
sender properly reflects the CE bit and does not set the W bit.
Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).
Fixes: bdf1ee5d3b ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the equivalent of commit 2c6b55f45d ("ipv6: fix neighbour
resolution with raw socket") for ip6_confirm_neigh(): we can send a
packet with MSG_CONFIRM on a raw socket for a connected route, so the
gateway would be :: here, and we should pick the next hop using
rt6_nexthop() instead.
This was found by code review and, to the best of my knowledge, doesn't
actually fix a practical issue: the destination address from the packet
is not considered while confirming a neighbour, as ip6_confirm_neigh()
calls choose_neigh_daddr() without passing the packet, so there are no
similar issues as the one fixed by said commit.
A possible source of issues with the existing implementation might come
from the fact that, if we have a cached dst, we won't consider it,
while rt6_nexthop() takes care of that. I might just not be creative
enough to find a practical problem here: the only way to affect this
with cached routes is to have one coming from an ICMPv6 redirect, but
if the next hop is a directly connected host, there should be no
topology for which a redirect applies here, and tests with redirected
routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on
raw sockets destined to a directly connected host.
However, directly using the dst gateway here is not consistent anymore
with neighbour resolution, and, in general, as we want the next hop,
using rt6_nexthop() looks like the only sane way to fetch it.
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_bind(), an rds_sock is added to the RDS bind hash table before
rs_transport is set. This means that the socket can be found by the
receive code path when rs_transport is NULL. And the receive code
path de-references rs_transport for congestion update check. This can
cause a panic. An rds_sock should not be added to the bind hash table
before all the needed fields are set.
Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Layer 2 Update frame is used to update bridges when a station roams
to another AP even if that STA does not transmit any frames after the
reassociation. This behavior was described in IEEE Std 802.11F-2003 as
something that would happen based on MLME-ASSOCIATE.indication, i.e.,
before completing 4-way handshake. However, this IEEE trial-use
recommended practice document was published before RSN (IEEE Std
802.11i-2004) and as such, did not consider RSN use cases. Furthermore,
IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been
maintained amd should not be used anymore.
Sending out the Layer 2 Update frame immediately after association is
fine for open networks (and also when using SAE, FT protocol, or FILS
authentication when the station is actually authenticated by the time
association completes). However, it is not appropriate for cases where
RSN is used with PSK or EAP authentication since the station is actually
fully authenticated only once the 4-way handshake completes after
authentication and attackers might be able to use the unauthenticated
triggering of Layer 2 Update frame transmission to disrupt bridge
behavior.
Fix this by postponing transmission of the Layer 2 Update frame from
station entry addition to the point when the station entry is marked
authorized. Similarly, send out the VLAN binding update only if the STA
entry has already been authorized.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* a fix in the new 6 GHz channel support
* a fix for recent minstrel (rate control) updates
for an infinite loop
* handle interface type changes better wrt. management frame
registrations (for management frames sent to userspace)
* add in-BSS RX time to survey information
* handle HW rfkill properly if !CONFIG_RFKILL
* send deauth on IBSS station expiry, to avoid state mismatches
* handle deferred crypto tailroom updates in mac80211 better
when device restart happens
* fix a spectre-v1 - really a continuation of a previous patch
* advertise NL80211_CMD_UPDATE_FT_IES as supported if so
* add some missing parsing in VHT extended NSS support
* support HE in mac80211_hwsim
* let mac80211 drivers determine the max MTU themselves
along with the usual cleanups etc.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl148l4ACgkQB8qZga/f
l8TafQ//Yjunnxq49ClKMDxyKwxIBvLRAMr4D3QwaWAcWUpar1/V0Ft/5glnHtbZ
7QptwbxVNUk/N68hYi8wlpMpvfzv/nShZD8QBNS1bk5E3Gng3yow03LhOx5iaYb2
KJXS1GE4jwkMD7Xn65+eMeb8rt1vEj8LleX91cguilq+y5YbcNFsP1nil2RtQyBU
cf5i8CBu4I5rTBoFaRvcz2xn+blqPSm2/piA+yXjzFp9vyVmhD+FjR5T482u48pj
wi/1zersGVUzBNElnZOKg67XPir1fcJqCfLILr7okPItWuYXHdVHGfn+arhimK4W
dyIpv1EfCe0lwKl4VTdhXt1GwhKvWCc2Ja7lz/RnDisGq9CYPJNfW7jFgR3tw4eg
SccnUhnxRIgD1V2KDcvsRadPo+2YsBJBmC61JRX1K6L3zpHLbktDyzxvTwCeVQ9A
TQp5bmQcKqDoq+/60JNHI6IxMbmX/vc2PC7dENGWUkem/JEmSWBB1wcL9gsRkhVi
c4uHyvFeXJaIV7YFA36hWCfa0fr+UUYxfzxudRSvxq/tTpayqBKu6fr3Pvv0SqTj
/BKkezoIdJntClhJv4PcDkZMva4uMCPtHCero9eICX5J+4AanD6caRNefX6L0PfF
DtN1sscVOy9zY2fV4tqvn3IASmOE5yB/dxjtKkMYyNHkZsGiDI0=
=HAuT
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2019-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
We have a number of changes, but things are settling down:
* a fix in the new 6 GHz channel support
* a fix for recent minstrel (rate control) updates
for an infinite loop
* handle interface type changes better wrt. management frame
registrations (for management frames sent to userspace)
* add in-BSS RX time to survey information
* handle HW rfkill properly if !CONFIG_RFKILL
* send deauth on IBSS station expiry, to avoid state mismatches
* handle deferred crypto tailroom updates in mac80211 better
when device restart happens
* fix a spectre-v1 - really a continuation of a previous patch
* advertise NL80211_CMD_UPDATE_FT_IES as supported if so
* add some missing parsing in VHT extended NSS support
* support HE in mac80211_hwsim
* let mac80211 drivers determine the max MTU themselves
along with the usual cleanups etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently frame registrations are not purged, even when changing the
interface type. This can lead to potentially weird situations where
frames possibly not allowed on a given interface type remain registered
due to the type switching happening after registration.
The kernel currently relies on userspace apps to actually purge the
registrations themselves, this is not something that the kernel should
rely on.
Add a call to cfg80211_mlme_purge_registrations() to forcefully remove
any registrations left over prior to switching the iftype.
Cc: stable@vger.kernel.org
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit 1222a16014 ("nl80211: Fix possible Spectre-v1 for CQM
RSSI thresholds") was incomplete and requires one more fix to
prevent accessing to rssi_thresholds[n] because user can control
rssi_thresholds[i] values to make i reach to n. For example,
rssi_thresholds = {-400, -300, -200, -100} when last is -34.
Cc: stable@vger.kernel.org
Fixes: 1222a16014 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://lore.kernel.org/r/20190908005653.17433-1-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make it possibly for drivers to adjust the default max_mtu
by storing it in the hardware struct and using that value
for all interfaces.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/1567738137-31748-1-git-send-email-wgong@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With the help of boolinit.cocci, we use !nl80211_reg_change_event_fill
instead of (nl80211_reg_change_event_fill == false). Meanwhile, Clean
up the code.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Link: https://lore.kernel.org/r/1567657537-65472-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we expire an inactive station, try to send it a deauth. This
helps if it's actually still around, and just has issues with
beacon distribution (or we do), and it will not also remove us.
Then, if we have shared state, this may not be reset properly,
causing problems; for example, we saw a case where aggregation
sessions weren't removed properly (due to the TX start being
offloaded to firmware and it relying on deauth for stop), causing
a lot of traffic to get lost due to the SN reset after remove/add
of the peer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-9-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We already assume that key is not NULL and dereference it in a few
other places before we check whether it is NULL, so the check is
unnecessary. Remove it.
Fixes: 96fc6efb9a ("mac80211: IEEE 802.11 Extended Key ID support")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-8-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In case we got a fw restart while roaming from encrypted AP to
non-encrypted one, we might end up with hitting a warning on the pending
counter crypto_tx_tailroom_pending_dec having a non-zero value.
The following comment taken from net/mac80211/key.c explains the rational
for the delayed tailroom needed:
/*
* The reason for the delayed tailroom needed decrementing is to
* make roaming faster: during roaming, all keys are first deleted
* and then new keys are installed. The first new key causes the
* crypto_tx_tailroom_needed_cnt to go from 0 to 1, which invokes
* the cost of synchronize_net() (which can be slow). Avoid this
* by deferring the crypto_tx_tailroom_needed_cnt decrementing on
* key removal for a while, so if we roam the value is larger than
* zero and no 0->1 transition happens.
*
* The cost is that if the AP switching was from an AP with keys
* to one without, we still allocate tailroom while it would no
* longer be needed. However, in the typical (fast) roaming case
* within an ESS this usually won't happen.
*/
The next flow lead to the warning eventually reported as a bug:
1. Disconnect from encrypted AP
2. Set crypto_tx_tailroom_pending_dec = 1 for the key
3. Schedule work
4. Reconnect to non-encrypted AP
5. Add a new key, setting the tailroom counter = 1
6. Got FW restart while pending counter is set ---> hit the warning
While on it, the ieee80211_reset_crypto_tx_tailroom() func was merged into
its single caller ieee80211_reenable_keys (previously called
ieee80211_enable_keys). Also, we reset the crypto_tx_tailroom_pending_dec
and remove the counters warning as we just reset both.
Signed-off-by: Lior Cohen <lior2.cohen@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-7-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we reach this point, the key cannot be NULL. Remove the condition
that suggests otherwise.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-6-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
"HE/HT/VHT" is a bit confusing since really the order of
development (and possible support) is different - change
this to "HT/VHT/HE".
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-4-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the RFKILL subsystem isn't available, then rfkill_blocked()
always returns false. In the case of hardware rfkill this will
be wrong though, as if the hardware reported being killed then
it cannot operate any longer.
Since we only ever call the rfkill_sync work in this case, just
rename it to rfkill_block and always pass "true" for the blocked
parameter, rather than passing rfkill_blocked().
We rely on the underlying driver to still reject any new attempt
to bring up the device by itself.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830112451.21655-2-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This fixes was missed in parsing the vht capabilities max bw
support.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Fixes: e80d642552 ("mac80211: copy VHT EXT NSS BW Support/Capable data to station")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190830114057.22197-1-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The boundary value used for the 6G band was incorrect as it would
result in invalid 6G channel number for certain frequencies.
Reported-by: Amar Singhal <asinghal@codeaurora.org>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1567510772-24263-1-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support for packet mirroring and redirection. The
nft_fwd_dup_netdev_offload() function configures the flow_action object
for the fwd and the dup actions.
Extend nft_flow_rule_destroy() to release the net_device object when the
flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps
the net_device reference counter.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>
Register a new synproxy stateful object type into the stateful object
infrastructure.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
a transport thresholds info.
Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.
Fixes: 8add543e36 ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
it would make no progress inside the loop in hhf_dequeue() thus
kernel would get stuck.
Fix this by checking this corner case in hhf_change().
Fixes: 10239edf86 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Terry Lam <vtlam@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least sch_red and sch_tbf don't implement ->tcf_block()
while still have a non-zero tc "class".
Instead of adding nop implementations to each of such qdisc's,
we can just relax the check of cops->tcf_block() in
tc_bind_tclass(). They don't support TC filter anyway.
Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the
device reset policy on driver probe.
This parameter is useful in conjunction with the existing
'fw_load_policy' parameter.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end.
In fact, NLMSG_DONE is sent only at the end of a dump.
Libraries like libnl will wait forever for NLMSG_DONE.
Fixes: 949f1e39a6 ("bridge: mdb: notify on router port add and del")
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add nft_offload_init() and nft_offload_exit() function to deal with the
init and the exit path of the offload infrastructure.
Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nft_offload_ctx structure is much too large to put on the
stack:
net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=]
Use dynamic allocation here, as we do elsewhere in the same
function.
Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The "newobj" is an error pointer so we can't pass it to kfree(). It
doesn't need to be freed so we can remove that and I also renamed the
error label.
Fixes: d62d0ba97b ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Unlike normal TCP code TLS has to touch the cache lines
it copies into to fill header info. On memory-heavy workloads
having non temporal stores and normal accesses targeting
the same cache line leads to significant overhead.
Measured 3% overhead running 3600 round robin connections
with additional memory heavy workload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For TLS device offload the tag/message authentication code are
filled in by the device. The kernel merely reserves space for
them. Because device overwrites it, the contents of the tag make
do no matter. Current code tries to save space by reusing the
header as the tag. This, however, leads to an additional frag
being created and defeats buffer coalescing (which trickles
all the way down to the drivers).
Remove this optimization, and try to allocate the space for
the tag in the usual way, leave the memory uninitialized.
If memory allocation fails rewind the record pointer so that
we use the already copied user data as tag.
Note that the optimization was actually buggy, as the tag
for TLS 1.2 is 16 bytes, but header is just 13, so the reuse
may had looked past the end of the page..
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All modifications to TLS record list happen under the socket
lock. Since records form an ordered queue readers are only
concerned about elements being removed, additions can happen
concurrently.
Use RCU primitives to ensure the correct access types
(READ_ONCE/WRITE_ONCE).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's generally more cache friendly to walk arrays in order,
especially those which are likely not in cache.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2019-09-06
Here's the main bluetooth-next pull request for the 5.4 kernel.
- Cleanups & fixes to btrtl driver
- Fixes for Realtek devices in btusb, e.g. for suspend handling
- Firmware loading support for BCM4345C5
- hidp_send_message() return value handling fixes
- Added support for utilizing Fast Advertising Interval
- Various other minor cleanups & fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801
("net: Add frag_list support to skb_segment"), quote:
As such we require all frag_list members terminate on exact MSS
boundaries. This is checked using BUG_ON.
As there should only be one producer in the kernel of such packets,
namely GRO, this requirement should not be difficult to maintain.
However, since commit 6578171a7f ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.
This lead to a kernel BUG_ON for flows involving:
- GRO generating a frag_list skb
- bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
- skb_segment() of the skb
See example BUG_ON reports in [0].
In commit 13acc94eff ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).
Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.
This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].
Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
- untrusted gso_size, by testing SKB_GSO_DODGY is set
(SKB_GSO_DODGY is set by any gso_size mangling functions in
net/core/filter.c)
- the frag_list is non empty, its item is a non head_frag, *and* the
headlen of the given 'head_skb' does not match the gso_size.
[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a re-post of previous patch wrote by David Miller[1].
Phil Karn reported[2] that on busy networks with lots of unresolved
multicast routing entries, the creation of new multicast group routes
can be extremely slow and unreliable.
The reason is we hard-coded multicast route entries with unresolved source
addresses(cache_resolve_queue_len) to 10. If some multicast route never
resolves and the unresolved source addresses increased, there will
be no ability to create new multicast route cache.
To resolve this issue, we need either add a sysctl entry to make the
cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len
limit directly, as we already have the socket receive queue limits of mrouted
socket, pointed by David.
>From my side, I'd perfer to remove the cache_resolve_queue_len limit instead
of creating two more(IPv4 and IPv6 version) sysctl entry.
[1] https://lkml.org/lkml/2018/7/22/11
[2] https://lkml.org/lkml/2018/7/21/343
v3: instead of remove cache_resolve_queue_len totally, let's only remove
the hard code limit when allocate the unresolved cache, as Eric Dumazet
suggested, so we don't need to re-count it in other places.
v2: hold the mfc_unres_lock while walking the unresolved list in
queue_count(), as Nikolay Aleksandrov remind.
Reported-by: Phil Karn <karn@ka9q.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes a stupid bug I recently introduced...
ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error.
Fixes: d55a2e374a ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'")
Cc: David Ahern <dsahern@gmail.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_diag_get_aux_size() can be called with sockets in any state.
icsk_ulp_ops is only present for full sockets.
For SYN_RECV or TIME_WAIT ones we would access garbage.
Fixes: 61723b3932 ("tcp: ulp: add functions to dump ulp-specific information")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Luke Hsiao <lukehsiao@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need for fib_notifier_ops to be in struct net. It is used only by
fib_notifier as a private data. Use net_generic to introduce per-net
fib_notifier struct and move fib_notifier_ops there.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi.
2) Time matching support, also from Ander Juaristi.
3) VLAN support for nfnetlink_log, from Michael Braun.
4) Support for set element deletions from the packet path, also from Ander.
5) Remove __read_mostly from conntrack spinlock, from Li RongQing.
6) Support for updating stateful objects, this also includes the initial
client for this infrastructure: the quota extension. A follow up fix
for the control plane also comes in this batch. Patches from
Fernando Fernandez Mancera.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of invoking struct bpf_prog::bpf_func directly, use the
BPF_PROG_RUN macro.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hidp_send_message was changed to return non-zero values on success,
which some other bits did not expect. This caused spurious errors to be
propagated through the stack, breaking some drivers, such as hid-sony
for the Dualshock 4 in Bluetooth mode.
As pointed out by Dan Carpenter, hid-microsoft directly relied on that
assumption as well.
Fixes: 48d9cc9d85 ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes")
Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Whenever MQ is not used on a multiqueue device, we experience
serious reordering problems. Bisection found the cited
commit.
The issue can be described this way :
- A single qdisc hierarchy is shared by all transmit queues.
(eg : tc qdisc replace dev eth0 root fq_codel)
- When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
a different transmit queue than the one used to build a packet train,
we stop building the current list and save the 'bad' skb (P1) in a
special queue. (bad_txq)
- When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
skb (P1), it checks if the associated transmit queues is still in frozen
state. If the queue is still blocked (by BQL or NIC tx ring full),
we leave the skb in bad_txq and return NULL.
- dequeue_skb() calls q->dequeue() to get another packet (P2)
The other packet can target the problematic queue (that we found
in frozen state for the bad_txq packet), but another cpu just ran
TX completion and made room in the txq that is now ready to accept
new packets.
- Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
at next round. In practice P2 is the lead of a big packet train
(P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
To solve this problem, we have to block the dequeue process as long
as the first packet in bad_txq can not be sent. Reordering issues
disappear and no side effects have been seen.
Fixes: a53851e2c3 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-09-05
1) Several xfrm interface fixes from Nicolas Dichtel:
- Avoid an interface ID corruption on changelink.
- Fix wrong intterface names in the logs.
- Fix a list corruption when changing network namespaces.
- Fix unregistation of the underying phydev.
2) Fix a potential warning when merging xfrm_plocy nodes.
From Florian Westphal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Add 2 new attributes
TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
so that tc utility can use them for 64bit rate and peakrate value to
break the 32bit limit, and still keep the backward binary compatibility.
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:
recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
Will be translated to the following tc rule:
$ tc filter add dev dev1 ingress \
prio 1 chain 0 proto ip \
flower tcp ct_state -trk \
action ct pipe \
action goto chain 2
Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.
To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct mgmt_rp_get_connections {
...
struct mgmt_addr_info addr[0];
} __packed;
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
So, replace the following form:
sizeof(*rp) + (i * sizeof(struct mgmt_addr_info));
with:
struct_size(rp, addr, i)
Also, notice that, in this case, variable rp_len is not necessary,
hence it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Static variable header_ops, of type header_ops, is used only once, when
it is assigned to field header_ops of a variable having type net_device.
This corresponding field is declared as const in the definition of
net_device. Hence make header_ops constant as well to protect it from
unnecessary modification.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Changes made to add support for fast advertising interval
as per core 4.1 specification, section 9.3.11.2.
A peripheral device entering any of the following GAP modes and
sending either non-connectable advertising events or scannable
undirected advertising events should use adv_fast_interval2
(100ms - 150ms) for adv_fast_period(30s).
- Non-Discoverable Mode
- Non-Connectable Mode
- Limited Discoverable Mode
- General Discoverable Mode
Signed-off-by: Spoorthi Ravishankar Koppad <spoorthix.k@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When accessing the members of an XDP socket, the control mutex should
be held. This commit fixes that.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: a36b38aa2a ("xsk: add sock_diag interface for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Prior the state variable was introduced by Ilya, the dev member was
used to determine whether the socket was bound or not. However, when
dev was read, proper SMP barriers and READ_ONCE were missing. In order
to address the missing barriers and READ_ONCE, we start using the
state variable as a point of synchronization. The state member
read/write is paired with proper SMP barriers, and from this follows
that the members described above does not need READ_ONCE if used in
conjunction with state check.
In all syscalls and the xsk_rcv path we check if state is
XSK_BOUND. If that is the case we do a SMP read barrier, and this
implies that the dev, umem and all rings are correctly setup. Note
that no READ_ONCE are needed for these variable if used when state is
XSK_BOUND (plus the read barrier).
To summarize: The members struct xdp_sock members dev, queue_id, umem,
fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state
are read lock-less. When these members are updated, WRITE_ONCE is
used. When read, READ_ONCE are only used when read outside the control
mutex (e.g. mmap) or, not synchronized with the state member
(XSK_BOUND plus smp_rmb())
Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due
to the introduce state synchronization (XSK_BOUND plus smp_rmb()).
Introducing the state check also fixes a race, found by syzcaller, in
xsk_poll() where umem could be accessed when stale.
Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The umem member of struct xdp_sock is read outside of the control
mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
potential store-tearing.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 423f38329d ("xsk: add umem fill queue support and mmap")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
potential store-tearing. These members are read outside of the control
mutex in the mmap implementation.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 37b076933a ("xsk: add missing write- and data-dependency barrier")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Not all objects have an update operation. If the object type doesn't
implement an update operation and the user tries to update it will hit
EOPNOTSUPP.
Fixes: d62d0ba97b ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When creating a v4 route that uses a v6 nexthop from a nexthop group.
Allow the kernel to properly send the nexthop as v6 via the RTA_VIA
attribute.
Broken behavior:
$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via 254.128.0.0 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$
Fixed behavior:
$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$
v2, v3: Addresses code review comments from David Ahern
Fixes: dcb1ecb50e (“ipv4: Prepare for fib6_nh from a nexthop object”)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use %*ph format to print small buffer as hex string.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl1vrJITHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdIC8BB/98XcWiaInD+SM6UjD2dVd1r0zhPKJS
WBK58G81+op3YP4DY8Iy+C24uZBlSlutVGoD/PIrZF39xXsnOtJuMVHC4LvtdADC
30uI/61JQNEjuX2AiTFudqDvYjZZKZ28HLqEnO2pWk3dMVL3+fkS3i7VQR7KJ/Gr
BYM6EzCdkbuWW/zsAVbKLJ8NswVmcdjP7eSK+exKppoWMtgCglZw1X6iP5YXDnbK
h3dGs687u8RfUra7j7vgnJzyQU4draMPsabaLDT5qw1PgYQ3k8MTVMBlULR0+HHO
qkBqumRwfOxay0z0XOgRuWrICKTH/b0SRLp3H53ZyfDo6+4TC9KGHRgX
=gwfZ
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.4-20190904' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-09-04 j1939
this is a pull request for net-next/master consisting of 21 patches.
the first 12 patches are by me and target the CAN core infrastructure.
They clean up the names of variables , structs and struct members,
convert can_rx_register() to use max() instead of open coding it and
remove unneeded code from the can_pernet_exit() callback.
The next three patches are also by me and they introduce and make use of
the CAN midlayer private structure. It is used to hold protocol specific
per device data structures.
The next patch is by Oleksij Rempel, switches the
&net->can.rcvlists_lock from a spin_lock() to a spin_lock_bh(), so that
it can be used from NAPI (soft IRQ) context.
The next 4 patches are by Kurt Van Dijck, he first updates his email
address via mailmap and then extends sockaddr_can to include j1939
members.
The final patch is the collective effort of many entities (The j1939
authors: Oliver Hartkopp, Bastian Stender, Elenita Hinds, kbuild test
robot, Kurt Van Dijck, Maxime Jayat, Robin van der Gracht, Oleksij
Rempel, Marc Kleine-Budde). It adds support of SAE J1939 protocol to the
CAN networking stack.
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
P.S.: This pull request doesn't invalidate my last pull request:
"pull-request: can-next 2019-09-03".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A change to the core nla helpers was missed during the push of
the nexthop changes. rt6_fill_node_nexthop should be calling
nla_nest_start_noflag not nla_nest_start. Currently, iproute2
does not print multipath data because of parsing issues with
the attribute.
Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_map and ULP only work together when ULP is loaded after the sock
map is loaded. In the sock_map case we added a check for this to fail
the load if ULP is already set. However, we missed the check on the
sock_hash side.
Add a ULP check to the sock_hash update path.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comment we have is just repeating what the code does.
Include the *reason* for the condition instead.
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If retransmit record hint fall into the cleanup window we will
free it by just walking the list. No need to duplicate the code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS code has a number of #ifdefs which make the code a little
harder to follow. Recent fixes removed the ifdef around the
TLS_HW define, so we can switch to the often used pattern
of defining tls_device functions as empty static inlines
in the header when CONFIG_TLS_DEVICE=n.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On setsockopt path we need to hold device_offload_lock from
the moment we check netdev is up until the context is fully
ready to be added to the tls_device_list.
No need to hold it around the get_netdev_for_sock().
Change the code and remove the confusing comment.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reusing parts of error path for normal exit will make
next commit harder to read, untangle the two.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we already have the pointer to the full original sk_proto
stored use that instead of storing all individual callback
pointers as well.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IN_MULTICAST's primary intent is as a uapi macro.
Elsewhere in the kernel we use ipv4_is_multicast consistently.
This patch unifies linux's multicast checks to use that function
rather than this macro.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable port_rate is being initialized with a value that is never read
and is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2019-09-05
Here are a few more Bluetooth fixes for 5.3. I hope they can still make
it. There's one USB ID addition for btusb, two reverts due to discovered
regressions, and two other important fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c49a8682fc.
There are devices which require low connection intervals for usable operation
including keyboards and mice. Forcing a static connection interval for
these types of devices has an impact in latency and causes a regression.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There is a subtle change in behaviour introduced by:
commit c7a1ce397a
'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
Before that patch /proc/net/ipv6_route includes:
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
Afterwards /proc/net/ipv6_route includes:
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
AFAICT, this is incorrect since these routes are *not* coming from RA's.
As such, this patch restores the old behaviour.
Fixes: c7a1ce397a ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
Cc: David Ahern <dsahern@gmail.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transport should use its own pf_retrans to do the error_count
check, instead of asoc's. Otherwise, it's meaningless to make
pf_retrans per transport.
Fixes: 5aa93bcf66 ("sctp: Implement quick failover draft from tsvwg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a misplaced traceline in rxrpc_input_packet() which is looking at a
packet that just got released rather than the replacement packet.
Fix this by moving the traceline after the assignment that moves the new
packet pointer to the actual packet pointer.
Fixes: d0d5c0cd1e ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
J1939, ISO 11783 and NMEA 2000 all share the same high level protocol.
SAE J1939 can be considered the replacement for the older SAE J1708 and
SAE J1587 specifications.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The size of this structure will be increased with J1939 support. To stay
binary compatible, the CAN_REQUIRED_SIZE macro is introduced for
existing CAN protocols.
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The can_rx_unregister() can be called from NAPI (soft IRQ) context, at least
by j1939 stack. This leads to potential dead lock with &net->can.rcvlists_lock
called from can_rx_register:
===============================================================================
WARNING: inconsistent lock state
4.19.0-20181029-1-g3e67f95ba0d3 #3 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
testj1939/224 [HC0[0]:SC1[1]:HE1:SE0] takes:
1ad0fda3 (&(&net->can.rcvlists_lock)->rlock){+.?.}, at: can_rx_unregister+0x4c/0x1ac
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0xd0/0x1f4
_raw_spin_lock+0x30/0x40
can_rx_register+0x5c/0x14c
j1939_netdev_start+0xdc/0x1f8
j1939_sk_bind+0x18c/0x1c8
__sys_bind+0x70/0xb0
sys_bind+0x10/0x14
ret_fast_syscall+0x0/0x28
0xbedc9b64
irq event stamp: 2440
hardirqs last enabled at (2440): [<c01302c0>] __local_bh_enable_ip+0xac/0x184
hardirqs last disabled at (2439): [<c0130274>] __local_bh_enable_ip+0x60/0x184
softirqs last enabled at (2412): [<c08b0bf4>] release_sock+0x84/0xa4
softirqs last disabled at (2415): [<c013055c>] irq_exit+0x100/0x1b0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&net->can.rcvlists_lock)->rlock);
<Interrupt>
lock(&(&net->can.rcvlists_lock)->rlock);
*** DEADLOCK ***
2 locks held by testj1939/224:
#0: 168eb13b (rcu_read_lock){....}, at: netif_receive_skb_internal+0x3c/0x350
#1: 168eb13b (rcu_read_lock){....}, at: can_receive+0x88/0x1c0
===============================================================================
To avoid this situation, we should use spin_lock_bh() instead of spin_lock().
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Since using the "struct can_ml_priv" for the per device "struct
dev_rcv_lists" the call can_dev_rcv_lists_find() cannot fail anymore.
This patch simplifies af_can by removing the NULL pointer checks from
the dev_rcv_lists returned by can_dev_rcv_lists_find().
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch removes the old method of allocating the per device protocol
specific memory via a netdevice_notifier. This had the drawback, that
the allocation can fail, leading to a lot of null pointer checks in the
code. This also makes the live cycle management of this memory quite
complicated.
This patch switches from the allocating the struct can_dev_rcv_lists in
a NETDEV_REGISTER call to using the dev->ml_priv, which is allocated by
the driver since the previous patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch introduces the CAN midlayer private structure ("struct
can_ml_priv") which should be used to hold protocol specific per device
data structures. For now it's only member is "struct can_dev_rcv_lists".
The CAN midlayer private is allocated via alloc_netdev()'s private and
assigned to "struct net_device::ml_priv" during device creation. This is
done transparently for CAN drivers using alloc_candev(). The slcan, vcan
and vxcan drivers which are not using alloc_candev() have been adopted
manually. The memory layout of the netdev_priv allocated via
alloc_candev() will looke like this:
+-------------------------+
| driver's priv |
+-------------------------+
| struct can_ml_priv |
+-------------------------+
| array of struct sk_buff |
+-------------------------+
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The networking core takes care and unregisters every network device in
a namespace before calling the can_pernet_exit() hook. This patch
removes the unneeded cleanup.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch replaces an open coded max by the proper kernel define max().
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch gives the variables holding the CAN receiver and the receiver
list a better name by renaming them from "r to "rcv" and "rl" to
"recv_list".
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch add the commonly used prefix "can_" to the find_dev_rcv_lists()
function and moves the "find" to the end, as the function returns a struct
can_dev_rcv_list. This improves the overall readability of the code.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch add the commonly used prefix "can_" to the find_rcv_list()
function and add the "find" to the end, as the function returns a struct
rcv_list. This improves the overall readability of the code.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch gives the variables holding the CAN per device receive filter lists
a better name by renaming them from "d" to "dev_rcv_lists".
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch gives the variables holding the CAN receive filter lists a
better name by renaming them from "d" to "dev_rcv_lists".
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch improves the code reability by removing the redundant "can_"
prefix from the members of struct netns_can (as the struct netns_can itself
is the member "can" of the struct net.)
The conversion is done with:
sed -i \
-e "s/struct can_dev_rcv_lists \*can_rx_alldev_list;/struct can_dev_rcv_lists *rx_alldev_list;/" \
-e "s/spinlock_t can_rcvlists_lock;/spinlock_t rcvlists_lock;/" \
-e "s/struct timer_list can_stattimer;/struct timer_list stattimer; /" \
-e "s/can\.can_rx_alldev_list/can.rx_alldev_list/g" \
-e "s/can\.can_rcvlists_lock/can.rcvlists_lock/g" \
-e "s/can\.can_stattimer/can.stattimer/g" \
include/net/netns/can.h \
net/can/*.[ch]
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch rename the variables holding the CAN statistics (can_stats
and can_pstats) to pkg_stats and rcv_lists_stats which reflect better
their meaning.
The conversion is done with:
sed -i \
-e "s/can_stats\([^_]\)/pkg_stats\1/g" \
-e "s/can_pstats/rcv_lists_stats/g" \
net/can/proc.c
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch rename the variables holding the CAN statistics (can_stats
and can_pstats) to pkg_stats and rcv_lists_stats which reflect better
their meaning.
The conversion is done with:
sed -i \
-e "s/can_stats\([^_]\)/pkg_stats\1/g" \
-e "s/can_pstats/rcv_lists_stats/g" \
net/can/af_can.c
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch gives the members of the struct netns_can that are holding
the statistics a sensible name, by renaming struct netns_can::can_stats
into struct netns_can::pkg_stats and struct netns_can::can_pstats into
struct netns_can::rcv_lists_stats.
The conversion is done with:
sed -i \
-e "s:\(struct[^*]*\*\)can_stats;.*:\1pkg_stats;:" \
-e "s:\(struct[^*]*\*\)can_pstats;.*:\1rcv_lists_stats;:" \
-e "s/can\.can_stats/can.pkg_stats/g" \
-e "s/can\.can_pstats/can.rcv_lists_stats/g" \
net/can/*.[ch] \
include/net/netns/can.h
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch renames both "struct s_stats" and "struct s_pstats", to
"struct can_pkg_stats" and "struct can_rcv_lists_stats" to better
reflect their meaning and improve code readability.
The conversion is done with:
sed -i \
-e "s/struct s_stats/struct can_pkg_stats/g" \
-e "s/struct s_pstats/struct can_rcv_lists_stats/g" \
net/can/*.[ch] \
include/net/netns/can.h
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Set up the default timeout for this new entry otherwise the garbage
collector might quickly remove it right after the flowtable insertion.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If this flag is set, timeout and state are irrelevant to userspace.
Fixes: 90964016e5 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If IPv6 is disabled on boot (ipv6.disable=1), but nft_fib_inet ends up
dealing with a IPv6 packet, it causes a kernel panic in
fib6_node_lookup_1(), crashing in bad_page_fault.
The panic is caused by trying to deference a very low address (0x38
in ppc64le), due to ipv6.fib6_main_tbl = NULL.
BUG: Kernel NULL pointer dereference at 0x00000038
The kernel panic was reproduced in a host that disabled IPv6 on boot and
have to process guest packets (coming from a bridge) using it's ip6tables.
Terminate rule evaluation when packet protocol is IPv6 but the ipv6 module
is not loaded.
Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the infrastructure needed for the stateful object update
support.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>