Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-10-22
Here's probably the last bluetooth-next pull request for 4.4. Among
several other changes it contains the rest of the fixes & cleanups from
the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).
- Refactoring & cleanups to 6lowpan code
- New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
- Fix (quirk) for broken Broadcom BCM2045 controllers
- Support for latest Apple controllers
- Improvements to the vendor diagnostic message support
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-23
This series contains updates to i40e, i40evf, if_link, ixgbe and ixgbevf.
Anjali adds a workaround to drop any flow control frames from being
transmitted from any VSI, so that a malicious VF cannot send flow control
or PFC packets out on the wire. Also fixed a bug in debugfs by grabbing
the filter list lock before adding or deleting a filter.
Akeem fixes an issue where we were unconditionally returning VEB bridge
mode before allowing LB in the add VSI routine, resolve by checking if
the bridge is actually in VEB mode first.
Mitch fixed an issue where the incorrect structure was being used for
VLAN filter list, which meant the VLAN filter list did not get
processed correctly and VLAN filters would not be re-enabled after any
kind of reset.
Helin fixed a problem of possibly getting inconsistent flow control
status after a PF reset. The issue was requested_mode was being set
with a default value during probe, but the hardware state could be a
different value from this mode.
Carolyn fixed a problem where the driver output of the OEM version
string varied from the other tools.
Jean Sacren fixes up kernel documentation by fixing function header
comments to match actual variables used in the functions. Also
cleaned up variable initialization, when the variable would be
over-written immediately.
Hiroshi Shimanoto provides three patches to add "trusted" VF by adding
netlink directives and an NDO entry. Then implement these new controls
in ixgbe and ixgbevf. This series has gone through several iterations
to address all the suggested community changes and concerns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.
Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
payload, if present.
Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for MPLS multipath routes.
Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
- 'struct mpls_nh' represents a mpls nexthop label forwarding entry
- moves mpls route and nexthop structures into internal.h
- A mpls_route can point to multiple mpls_nh structs
- the nexthops are maintained as a array (similar to ipv4 fib)
- In the process of restructuring, this patch also consistently changes
all labels to u8
- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib
- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch
- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple
nexthops. Cannot match against any single nexthops dev. So, this patch
removes the unused 'dev' handling in mpls_route_update.
- dead route/path handling will be implemented in a subsequent patch
Example:
$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
nexthop as to 800 via inet 40.1.1.2 dev swp3
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netlink directives and ndo entry to trust VF user.
This controls the special permission of VF user.
The administrator will dedicatedly trust VF user to use some features
which impacts security and/or performance.
The administrator never turn it on unless VF user is fully trusted.
CC: Sy Jong Choi <sy.jong.choi@intel.com>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.
Only one must win the race, otherwise corruption would occur.
To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.
If request socket was not found, we have to undo the child creation.
This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.
Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently adding a new ipv4 address always cause the creation of the
related network route, with default metric. When a host has multiple
interfaces on the same network, multiple routes with the same metric
are created.
If the userspace wants to set specific metric on each routes, i.e.
giving better metric to ethernet links in respect to Wi-Fi ones,
the network routes must be deleted and recreated, which is error-prone.
This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4
address. When an address is added with such flag set, no associated
network route is created, no network route is deleted when
said IP is gone and it's up to the user space manage such route.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.
Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less
than 1 << dctcp_shift_g, then alpha may never reach zero. For example,
given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha
remains 15. The effect isn't noticeable in this case below cwnd=137, but
could gradually drive uncongested flows with leftover alpha down to
cwnd=137. A larger dctcp_shift_g would have a greater effect.
This change causes alpha=15 to drop to 0 instead of being decrementing by 1
as it would when alpha=16. However, it requires one less conditional to
implement since it doesn't have to guard against subtracting 1 from 0U. A
decay of 15 is not unreasonable since an equal or greater amount occurs at
alpha >= 240.
Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices. Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.
Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.
Testing: Verified fix on kernel with lockdep enabled.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2015-10-22
1) Fix IPsec pre-encap fragmentation for GSO packets.
From Herbert Xu.
2) Fix some header checks in _decode_session6.
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups. From Mathias Krause.
3) Allow to change the replay threshold and expiry timer of a
state without having to set other attributes like replay
counter and byte lifetime. Changing these other attributes
may break the SA. From Michael Rossberg.
4) Fix pmtu discovery for local generated packets.
We may fail dispatch to the inner address family.
As a reault, the local error handler is not called
and the mtu value is not reported back to userspace.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
No driver implements port_fdb_getnext anymore, and port_fdb_dump is
preferred anyway, so remove this function from DSA.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all switch chips support a Get Next operation to iterate on its FDB.
So add a more simple port_fdb_dump function for them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
741a11d9e4 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.
Fixes: 741a11d9e4 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* I merged net-next back to avoid a conflict with the
* cfg80211 scheduled scan API extensions
* preparations for better scan result timestamping
* regulatory cleanups
* mac80211 statistics cleanups
* a few other small cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJWJ6lbAAoJEDBSmw7B7bqraasP/Ryaa7zL10E+dOQtqBQHQeMe
olbrCUtTYltr4nnuESzh5WPeIVZBQ0DIduoLLF0IDSPVwE/NrbpFUVIMHvJvr+s7
rE9k8RB4P7BMTjf+mkDX1Od9kCKGkt4ezcyt/oNIsqM12SN9JQ99itwz6Mp94xCs
XKsiXJRh9f/8Qwd/74qQq1Va3UfGAVuKO8WpUe/A7TYTla8ZY20pv1D8kQKQzrFg
DwsMirjmHcUpobSjnPAAmZevRxdk6o0E+P7DYG172H2Tm8/EIMR/gYMnQeYW6HkA
lfMMDfAGmNvyRm8v1iuBLodREP4kn4VbhMSZDtH7D6FYfmJh5fSeG09bSe51G5Xh
zv/B8A1cCbWFqtQHp3wI6ml8VDyAhDc2Hvqb75KRn6FplIkEiszVP0y3cNHWiJVt
Ix6Sysoa6kQDXEgR50APeLJ3VI+/mhXmvIila4jP9PKhO14SDHrCoRQO62Z0COJ7
2E5Ir2KE8T+O9mSeuB7m8xD/t60HDd3q3tLZmH0Ps6xfxKf9y2hdZacbX4Hi5Mqk
2XxXZYnhAXUqZmZhmG3ajnEiB4UGMt21R7dIqNTaQ9chOGBkHqIZxPm82XtNb13h
yHILavGpUDT0z6OB2z8fxUcj4a4SrrK+aiIGh4iFpDR0Nu0IyZ5cPHXY2FfvJWmD
ZO74RMEpBodYR8BsV4yP
=uZ5N
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Here's another set of patches for the current cycle:
* I merged net-next back to avoid a conflict with the
* cfg80211 scheduled scan API extensions
* preparations for better scan result timestamping
* regulatory cleanups
* mac80211 statistics cleanups
* a few other small cleanups and fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions to a worker thread. All these functions were
deallocation of resources related to the transport itself.
Furthermore, an unused callback was removed to simplify the
cleanup.
Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.
Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).
Reviewed-by: Aditya Asarwade <asarwade@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.
This can be easily reproduced by the following snippet:
int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));
This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().
Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With use of lwtunnel, we can directly call dev_queue_xmit()
rather than calling netdev vport send operation.
Following change make tunnel vport code bit cleaner.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch fixes following sparse warning.
net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types)
net/openvswitch/flow_netlink.c:583:30: expected restricted __be16 [usertype] ipv4
net/openvswitch/flow_netlink.c:583:30: got int
Fixes: 6b26ba3a7d ("openvswitch: netlink attributes for IPv6 tunneling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the addition of support for diagnostic feature, it makes sense to
increase the minor version of the Bluetooth core module.
The module version is not used anywhere, but it gives a nice extra
hint for debugging purposes.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Looking at current situation of memory management in 6lowpan receive
function I detected some invalid handling. After calling
lowpan_invoke_rx_handlers we will do a kfree_skb and then NET_RX_DROP on
error handling. We don't do this before, also on
skb_share_check/skb_unshare which might manipulate the reference
counters.
After running some 'grep -r "dev_add_pack" net/' to look how others
packet-layer receive callbacks works I detected that every subsystem do
a kfree_skb, then NET_RX_DROP without calling skb functions which
might manipulate the skb reference counters. This is the reason why we
should do the same here like all others subsystems. I didn't find any
documentation how the packet-layer receive callbacks handle NET_RX_DROP
return values either.
This patch will add a kfree_skb, then NET_RX_DROP handling for the
"trivial checks", in case of skb_share_check/skb_unshare the kfree_skb
call will be done inside these functions.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are a few places that don't explicitly check the connection
state before calling hci_disconnect(). To make this API do the right
thing take advantage of the new hci_abort_conn() API and also make
sure to only read the clock offset if we're really connected.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Convert the various places mapping connection state to
disconnect/cancel HCI command to use the new hci_abort_conn helper
API.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are several different places needing to make sure that a
connection gets disconnected or canceled. The exact action needed
depends on the connection state, so centralizing this logic can save
quite a lot of code duplication.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For connection parameters that are left around until a disconnection
we should at least clear any auto-connection properties. This way a
new Add Device call is required to re-set them after calling Unpair
Device.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tom Herbert added SIT support to GRO with commit
19424e052f ("sit: Add gro callbacks to sit_offload"),
later reverted by Herbert Xu.
The problem came because Tom patch was building GRO
packets without proper meta data : If packets were locally
delivered, we would not care.
But if packets needed to be forwarded, GSO engine was not
able to segment individual segments.
With the following patch, we correctly set skb->encapsulation
and inner network header. We also update gso_type.
Tested:
Server :
netserver
modprobe dummy
ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
arp -s 8.0.0.100 4e:32:51:04:47:e5
iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
ifconfig sixtofour0
sixtofour0 Link encap:IPv6-in-IPv4
inet6 addr: 2002:af6:798::1/128 Scope:Global
inet6 addr: 2002:af6:798::/128 Scope:Global
UP RUNNING NOARP MTU:1480 Metric:1
RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:20319631739 (20.3 GB) TX bytes:29529556 (29.5 MB)
Client :
netperf -H 2002:af6:798::1 -l 1000 &
Checked on server traffic copied on dummy0 and verify segments were
properly rebuilt, with proper IP headers, TCP checksums...
tcpdump on eth0 shows proper GRO aggregation takes place.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct).
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains four Netfilter fixes for net, they are:
1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6.
2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking
--accept-local, from Xin Long.
3) Wait for RCU grace period after dropping the pending packets in the
nfqueue, from Florian Westphal.
4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().
The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).
This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.
Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit d999297c3d ("tipc: reduce locking scope during packet reception")
we altered the packet retransmission function. Since then, when
restransmitting packets, we create a clone of the original buffer
using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of
the area we want to have copied, but also the smallest possible TIPC
packet size. The value of MIN_H_SIZE is 24.
Unfortunately, __pskb_copy() also has the effect that the headroom
of the cloned buffer takes the size MIN_H_SIZE. This is too small
for carrying the packet over the UDP tunnel bearer, which requires
a minimum headroom of 28 bytes. A change to just use pskb_copy()
lets the clone inherit the original headroom of 80 bytes, but also
assumes that the copied data area is of at least that size, something
that is not always the case. So that is not a viable solution.
We now fix this by adding a check for sufficient headroom in the
transmit function of udp_media.c, and expanding it when necessary.
Fixes: commit d999297c3d ("tipc: reduce locking scope during packet reception")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code for message reassembly is erroneously assuming that
the the first arriving fragment buffer always is linear, and then goes
ahead resetting the fragment list of that buffer in anticipation of
more arriving fragments.
However, if the buffer already happens to be non-linear, we will
inadvertently drop the already attached fragment list, and later
on trig a BUG() in __pskb_pull_tail().
We see this happen when running fragmented TIPC multicast across UDP,
something made possible since
commit d0f91938be ("tipc: add ip/udp media type")
We fix this by not resetting the fragment list when the buffer is non-
linear, and by initiatlizing our private fragment list tail pointer to
the tail of the existing fragment list.
Fixes: commit d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"openvswitch: Remove vport stats" removed the per-vport statistics, in
order to use the netdev's statistics fields.
"openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
to user-space, by using the provided netdev_ops to collate them - but ovs
internal devices still use an unallocated dev->tstats field to count
packets, which are no longer exported by this api.
Allocate the dev->tstats field for ovs internal devices, and wire up
ndo_get_stats64 with the original implementation of
ovs_vport_get_stats().
On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
unmasking a full-on panic on arm64:
=============%<==============
[<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch]
[<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch]
[<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch]
[<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch]
[<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
[<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310
[<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4
[<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc
[<ffffffc0005e8a94>] genl_rcv+0x38/0x50
[<ffffffc0005e76c0>] netlink_unicast+0x164/0x210
[<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368
[<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c
[SNIP]
Kernel panic - not syncing: Fatal exception in interrupt
=============%<==============
Fixes: 8c876639c9 ("openvswitch: Remove vport stats.")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6e28b00082 ("net: Fix vti use case with oif in dst lookups for IPv6")
is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.
Fixes: 42a7b32b73 ("xfrm: Add oif to dst lookups")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default fix broadcast window size is currently set to 20 packets.
This is a very low value, set at a time when we were still testing on
10 Mb/s hubs, and a change to it is long overdue.
Commit 7845989cb4 ("net: tipc: fix stall during bclink wakeup procedure")
revealed a problem with this low value. For messages of importance LOW,
the backlog queue limit will be calculated to 30 packets, while a
single, maximum sized message of 66000 bytes, carried across a 1500 MTU
network consists of 46 packets.
This leads to the following scenario (among others leading to the same
situation):
1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26
packets to the backlog queue.
2: Msg 2 of 46 packets is attempted sent, but rejected because there is
no more space in the backlog queue at this level. The sender is added
to the wakeup queue with a "pending packets chain size" number of 46.
3: Some packets in the transmit queue are acked and released. We try to
wake up the sender, but the pending size of 46 is bigger than the LOW
wakeup limit of 30, so this doesn't happen.
5: Subsequent acks releases all the remaining buffers. Each time we test
for the wakeup criteria and find that 46 still is larger than 30,
even after both the transmit and the backlog queues are empty.
6: The sender is never woken up and given a chance to send its message.
He is stuck.
We could now loosen the wakeup criteria (used by link_prepare_wakeup())
to become equal to the send criteria (used by tipc_link_xmit()), i.e.,
by ignoring the "pending packets chain size" value altogether, or we can
just increase the queue limits so that the criteria can be satisfied
anyway. There are good reasons (potentially multiple waiting senders) to
not opt for the former solution, so we choose the latter one.
This commit fixes the problem by giving the broadcast link window a
default value of 50 packets. We also introduce a new minimum link
window size BCLINK_MIN_WIN of 32, which is enough to always avoid the
described situation. Finally, in order to not break any existing users
which may set the window explicitly, we enforce that the window is set
to the new minimum value in case the user is trying to set it to
anything lower.
Fixes: 7845989cb4 ("net: tipc: fix stall during bclink wakeup procedure")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's only one user of this helper which can be replaces with a call
to hci_pend_le_action_lookup() and a check for params->explicit_connect.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no need to clear the HCI_CONN_ENCRYPT_PEND flag in
smp_failure. In fact, this may cause the encryption tracking to get
out of sync as this has nothing to do with HCI activity.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The hci_le_create_connection_cancel() function needs to use the hdev
pointer in many places so add a variable for it to avoid the need to
dereference the hci_conn every time.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of doing all of the LE-specific handling in an else-branch in
unpair_device() create a 'done' label for the BR/EDR branch to jump to
and then remove the else-branch completely.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Use the new hci_conn_hash_lookup_le() API to look up LE connections.
This way we're guaranteed exact matches that also take into account
the address type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Use the new hci_conn_hash_lookup_le() API to look up LE connections.
This way we're guaranteed exact matches that also take into account
the address type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The mgmt code needs to convert from mgmt/L2CAP address types to HCI in
many places. Having a dedicated helper function for this simplifies
code by shortening it and removing unnecessary 'addr_type' variables.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>