If the actions (re)allocation fails, or the actions list is larger than the
maximum size, and the conntrack action is the last action when these
problems are hit, then references to helper modules may be leaked. Fix
the issue.
Fixes: cae3a26275 ("openvswitch: Allow attaching helpers to ct action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP is lacking proper np->opt cloning at accept() time.
TCP and DCCP use ipv6_dup_options() helper, do the same
in SCTP.
We might later factorize this code in a common helper to avoid
future mistakes.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets rid of the following compile warn:
net/mpls/mpls_iptunnel.c:40:5: warning: no previous prototype for
mpls_output [-Wmissing-prototypes]
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM can deal with SYNACK messages, sent while listener socket
is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
and xfrm_sk_policy_lookup()
This might serve as the first step to remove xfrm.xfrm_policy_lock
use in fast path.
Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will soon switch sk->sk_policy[] to RCU protection,
as SYNACK packets are sent while listener socket is not locked.
This patch simply adds RCU grace period before struct xfrm_policy
freeing, and the corresponding rcu_head in struct xfrm_policy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A Linux PC is connected with another device over Bluetooth PAN using a
BNEP interface.
Whenever a packet is tried to be sent over the BNEP interface, the
function "bnep_net_xmit()" in "net/bluetooth/bnep/netdev.c" is called.
This function calls "bnep_net_mc_filter()", which checks (if the
destination address is multicast) if the address is set in a certain
multicast filter (&s->mc_filter). If it is not, then it is not sent out.
This filter is only changed in two other functions, found in
net/bluetooth/bnep/core.c": in "bnep_ctrl_set_mc_filter()", which is
only called if a message of type "BNEP_FILTER_MULTI_ADDR_SET" is
received. Otherwise, it is set in "bnep_add_connection()", where it is
set to a default value which only adds the broadcast address to the
filter:
set_bit(bnep_mc_hash(dev->broadcast), (ulong *) &s->mc_filter);
To sum up, if the BNEP interface does not receive any message of type
"BNEP_FILTER_MULTI_ADDR_SET", it will not send out any messages with
multicast destination addresses except for broadcast.
However, in the BNEP specification (page 27 in
http://grouper.ieee.org/groups/802/15/Bluetooth/BNEP.pdf), it is said
that per default, all multicast addresses should not be filtered, i.e.
the BNEP interface should be able to send packets with any multicast
destination address.
It seems that the default case is wrong: the multicast filter should not
block almost all multicast addresses, but should not filter out any.
This leads to the problem that e.g. Neighbor Solicitation messages sent
with Bluetooth PAN over the BNEP interface to a multicast destination
address other than broadcast are blocked and not sent out.
Therefore, in the default case, we set the mc_filter to ~0LL to not
filter out any multicast addresses.
Signed-off-by: Danny Schweizer <danny.schweizer@proofnet.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch reverts 6001d52 ("mac802154: tx: don't allow if down while
sync tx"). This has side effects with stop callback which flush the
transmit workqueue. The stop callback will wait until the workqueue is
flushed and holding the rtnl lock. That means it can happen that the stop
callback waits forever because it try to lock the rtnl mutex which is
already hold by stop callback.
Cc: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
CONFIG_NF_CONNTRACK=m
CONFIG_NF_DUP_IPV4=y
results in:
net/built-in.o: In function `nf_dup_ipv4':
>> (.text+0xd434f): undefined reference to `nf_conntrack_untracked'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If we attach the sk to the skb from nfnetlink_rcv_batch(), then
netlink_skb_destructor() will underflow the socket receive memory
counter and we get warning splat when releasing the socket.
$ cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
ffff8800ca903000 12 0 00000000 -54144 0 0 2 0 17942
^^^^^^
Rmem above shows an underflow.
And here below the warning splat:
[ 1363.815976] WARNING: CPU: 2 PID: 1356 at net/netlink/af_netlink.c:958 netlink_sock_destruct+0x80/0xb9()
[...]
[ 1363.816152] CPU: 2 PID: 1356 Comm: kworker/u16:1 Tainted: G W 4.4.0-rc1+ #153
[ 1363.816155] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
[ 1363.816160] Workqueue: netns cleanup_net
[ 1363.816163] 0000000000000000 ffff880119203dd0 ffffffff81240204 0000000000000000
[ 1363.816169] ffff880119203e08 ffffffff8104db4b ffffffff813d49a1 ffff8800ca771000
[ 1363.816174] ffffffff81a42b00 0000000000000000 ffff8800c0afe1e0 ffff880119203e18
[ 1363.816179] Call Trace:
[ 1363.816181] <IRQ> [<ffffffff81240204>] dump_stack+0x4e/0x79
[ 1363.816193] [<ffffffff8104db4b>] warn_slowpath_common+0x9a/0xb3
[ 1363.816197] [<ffffffff813d49a1>] ? netlink_sock_destruct+0x80/0xb9
skb->sk was only needed to lookup for the netns, however we don't need
this anymore since 633c9a840d ("netfilter: nfnetlink: avoid recurrent
netns lookups in call_batch") so this patch removes this manual socket
assignment to resolve this problem.
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Some users of rfkill, like NFC and cfg80211, use a dynamic name when
allocating rfkill, in those cases dev_name(). Therefore, the pointer
passed to rfkill_alloc() might not be valid forever, I specifically
found the case that the rfkill name was quite obviously an invalid
pointer (or at least garbage) when the wiphy had been renamed.
Fix this by making a copy of the rfkill name in rfkill_alloc().
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch will introduce a 6lowpan entry into the debugfs if enabled.
Inside this 6lowpan directory we create a subdirectories of all 6lowpan
interfaces to offer a per interface debugfs support.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces register and unregister functionality for lowpan
interfaces. While register a lowpan interface there are several things
which need to be initialize by the 6lowpan subsystem. Upcoming
functionality need to register/unregister per interface components e.g.
debugfs entry.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we're doing background scanning and connection attempts it's
possible we timeout trying to connect and go back to scanning again.
The timeout triggers a HCI_LE_Create_Connection_Cancel which will
trigger a Connection Complete with "Unknown Connection Identifier"
error status. Since we go back to scanning this isn't really a failure
and shouldn't be presented as such to user space through mgmt.
The exception to this is if the connection attempt was due to an
explicit request on an L2CAP socket (indicated by
params->explicit_connect being true). Since the socket will get an
error it's consistent to also notify the failure on mgmt in this case.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
All LE connections are now triggered through a preceding passive scan
and waiting for a connectable advertising report. This means we've got
the best possible guarantee that the device is within range and should
be able to request the controller to perform continuous scanning. This
way we minimize the risk that we miss out on any advertising packets.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.3+
We can simplify a lot of code by making sure hdev->cur_adv_instance is
always up-to-date. This allows e.g. the removal of the
get_current_adv_instance() helper function and the special
HCI_ADV_CURRENT value. This patch also makes selecting instance 0x00
explicit in the various calls where advertising instances aren't
enabled, e.g. when HCI_ADVERTISING is set or we've just finished
enabling LE.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The logic in powered_update_hci() to initialize the advertising data &
state is a bit more complicated than it needs to be. It was previously
not doing anything if HCI_LE_ENABLED wasn't set, but this was not
obvious by quickly looking at the code. Now the conditions for the
various actions are more explicit. Another simplification is due to
the fact that __hci_req_schedule_adv_instance() takes care of setting
hdev->cur_adv_instance so there's no need to set it before calling the
function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The hci_req_run() function already checks for empty cmd_q and bails
out if necessary. Also, req.cmd_q should really be treated as private
data of the request and not accessed directly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The __hci_req_update_scan_rsp_data gets the instance to be updated
which should get passed to update_inst_scan_rsp_data() instead of
always enabling the current instance.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This flag just tells us whether hdev->adv_instances is empty or not.
We can equally well use the list_empty() function to get this
information.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The code in the Read Advertising Features mgmt command handler is
unnecessarily complicated. Clean it up and remove unnecessary
variables & branches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The request to update HCI during power on is always coming either from
hdev->req_workqueue or through an ioctl, so it's safe to use
hci_req_sync for it. This way we also eliminate potential races with
incoming mgmt commands or other actions while powering on.
Part of this refactoring is the splitting of mgmt_powered() into
mgmt_power_on() and __mgmt_power_off() functions. The main reason is
the different requirements as far as hdev locking is concerned, as
highlighted with the __ prefix of the power off API.
Since the power on in the case of clearing the AUTO_OFF flag cannot be
done synchronously in the set_powered mgmt handler, the hci_power_on
work callback is extended to cover this (which also simplifies the
set_powered helper a lot).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon need this both in hci_request.c and mgmt.c so move it to
hci_request.c as a generic helper.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon need to update the EIR both from hci_request.c and mgmt.c
so move update_eir() as a more generic request helper to
hci_request.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon need this both from hci_request.c and mgmt.c so move it as
a request helper function to hci_request.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since the other discoverable changes are behind req_workqueue now it
only makes sense to move the discoverable timeout there as well.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The discoverable mode is intrinsically linked with the connectable
mode e.g. through sharing the same HCI command (Write Scan Enable) for
BR/EDR. It makes therefore sense to move it to hci_request.c and run
the changes through the same hdev->req_workqueue.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Class of Device needs to be changed e.g. for limited discoverable
mode. In preparation of moving the discoverable mode to hci_request.c
and hdev->req_workqueue, move the Class of Device helpers there first.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This way the connectable changes are synchronized against each other,
which helps avoid potential races. The connectable mode is also linked
together with LE advertising which makes is more convenient to have it
behind the same workqueue.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This paves the way for eventually performing advertising changes
through the hdev->req_workqueue. Some new APIs need to be exposed from
mgmt.c to hci_request.c and vice-versa, but many of them will go away
once hdev->req_workqueue gets used.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This way we avoid the need to do a forward declaration in later
patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since Add/Remove Device perform the page scan updates independently
from the HCI command completion we've introduced a potential race when
multiple mgmt commands are queued. Doing the page scan updates through
the req_workqueue ensures that the state changes are performed in a
race-free manner.
At the same time, to make the request helper more widely usable,
extend it to also cover Inquiry Scan changes since those are behind
the same HCI command. This is also reflected in the new name of the
API as well as the work struct name.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Change return type of nfulnl_set_timeout() and nfulnl_set_qthresh() to
be void.
This patch changes the return type of the static methods
nfulnl_set_timeout() and nfulnl_set_qthresh() to be void, as there is no
justification and no need for these methods to return int.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 3bfe049807 ("netfilter: nfnetlink_{log,queue}:
Register pernet in first place") reorganised the initialisation
order of the pernet_subsys to avoid "use-before-initialised"
condition. However, in doing so the cleanup logic in nfnetlink_queue
got botched in that the pernet_subsys wasn't cleaned in case
nfnetlink_subsys_register failed. This patch adds the necessary
cleanup routine call.
Fixes: 3bfe049807 ("netfilter: nfnetlink_{log,queue}: Register pernet in first place")
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Valdis reports NULL deref in nf_ct_frag6_gather.
Problem is bogus use of skb_queue_walk() -- we miss first skb in the list
since we start with head->next instead of head.
In case the element we're looking for was head->next we won't find
a result and then trip over NULL iter.
(defrag uses plain NULL-terminated list rather than one terminated by
head-of-list-pointer, which is what skb_queue_walk expects).
Fixes: 029f7f3b87 ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Only needed when meta nftrace rule(s) were added.
The assumption is that no such rules are active, so the call to
nft_trace_init is "never" needed.
When nftrace rules are active, we always call the nft_trace_* functions,
but will only send netlink messages when all of the following are true:
- traceinfo structure was initialised
- skb->nf_trace == 1
- at least one subscriber to trace group.
Adding an extra conditional
(static_branch ... && skb->nf_trace)
nft_trace_init( ..)
Is possible but results in a larger nft_do_chain footprint.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft monitor mode can then decode and display this trace data.
Parts of LL/Network/Transport headers are provided as separate
attributes.
Otherwise, printing IP address data becomes virtually impossible
for userspace since in the case of the netdev family we really don't
want userspace to have to know all the possible link layer types
and/or sizes just to display/print an ip address.
We also don't want userspace to have to follow ipv6 header chains
to get the s/dport info, the kernel already did this work for us.
To avoid bloating nft_do_chain all data required for tracing is
encapsulated in nft_traceinfo.
The structure is initialized unconditionally(!) for each nft_do_chain
invocation.
This unconditionall call will be moved under a static key in a
followup patch.
With lots of help from Patrick McHardy and Pablo Neira.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.
net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.
Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.
While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.
In preparation, this patch updates sock->sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used. Once either of the two is used,
sock->sk_cgrp_data reverts to its previous role of carrying prioidx
and classid. This is to avoid adding yet another cgroup related field
to struct sock.
As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead. It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs. Non-critical inaccuracies from small race windows won't
make any noticeable difference.
This patch doesn't make use of the pointer yet. The following patch
will implement netfilter match for cgroup2 membership.
v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
cgroup specific field.
v3: Add comments explaining why sock_data_prioidx() and
sock_data_classid() use different fallback values.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct
and its accessors are defined in cgroup-defs.h. This is to prepare
for overloading the fields with a cgroup pointer.
This patch mostly performs equivalent conversions but the followings
are noteworthy.
* Equality test before updating classid is removed from
sock_update_classid(). This shouldn't make any noticeable
difference and a similar test will be implemented on the helper side
later.
* sock_update_netprioidx() now takes struct sock_cgroup_data and can
be moved to netprio_cgroup.h without causing include dependency
loop. Moved.
* The dummy version of sock_update_netprioidx() converted to a static
inline function while at it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
netprio builds per-netdev contiguous priomap array which is indexed by
css->id. The array is allocated using kzalloc() effectively limiting
the maximum ID supported to some thousand range. This patch caps the
maximum supported css->id to USHRT_MAX which should be way above what
is actually useable.
This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed
up part will be used to overload the cgroup related fields.
sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
two cgroup related fields are adjacent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 0d76d6e8b2 and merge
commit c402293bd7, reversing changes made
to c89359a42e.
The virtio-vsock device specification is not finalized yet. Michael
Tsirkin voiced concerned about merging this code when the hardware
interface (and possibly the userspace interface) could still change.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the kernel generally uses negated error numbers, *err needs to be
compared with -EAGAIN (d'oh).
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ea3793ee29 ("core: enable more fine-grained datagram reception control")
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP SYNACK messages might now be attached to request sockets.
XFRM needs to get back to a listener socket.
Adds new helpers that might be used elsewhere :
sk_to_full_sk() and sk_const_to_full_sk()
Note: We also need to add RCU protection for xfrm lookups,
now TCP/DCCP have lockless listener processing. This will
be addressed in separate patches.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While cooking the sctp np->opt rcu fixes, I forgot to move
one rcu_read_unlock() after the added rcu_dereference() in
sctp_v6_get_dst()
This gave lockdep warnings reported by Dave Jones.
Fixes: c836a8ba93 ("ipv6: sctp: add rcu protection around np->opt")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- prevent compatibility issue between DAT and speedy join from creating
inconsistencies in the global translation table
- make sure temporary TT entries are purged out if not claimed
- fix comparison function used for TT hash table
- fix invalid stack access in batadv_dat_select_candidates()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWZZqMAAoJENpFlCjNi1MRPg4P/Rq04JLU51tzIYnC7dmkcKjY
L0qFTF4C4FjRzyeDH7d7GwfDJAu/AN/sCw/N0UzkBQ8qlGmtM5pkyTglr12w2axT
V2LalU8nR+VIqL/DvCmMAPthwRsXfAXAkwiKEPCbKp9GHOKuVF9uIC+UQP0V0HQE
6qKVii1j/IhVJWa30LrP8Uk/W4Ji55oB/oGBH4ggbWf0YWnaSldldfGGjtEY0JVd
dfcOyTCYX3NiGs9QB5HRz2NDhc0t1u0CWTTaM1n5w7l2ejlUH6JYwqOfAr5Q5STt
C265KonUeZCfq/ktzcEZ9c0NWHy8OePHA2anO1e5zPV/TwXNsSb50AnAHslaA7Fd
vgZ3svTjhxowU20P21gMYOJGlUpmjMx+klBEcd0rAKEZHjcq2IFLS4LA6I8en8o4
XtcJLj1Rrk/4XV9BntOlLVIfrA8TKH7B/vjAkbLa45WbH74B1V1sODmGxIv4mUVs
Vhe5lTyPktKFhducU0g4jDIg+Up07JIrMKs9/9TXWXuXf4tH255yg42dYSCOoqQD
4haKbUqQN90KvR1/cKGtkwpq0BlZq75RW4+SFtsUUZbdZyZNGx2zgR8g5e1owdN2
1I4c2XAsTbJMEaB86hsNWv/goy0H57dn4SNwi34BJr6OADfcorr2qGd4LRrTDUaL
hpwirVIN0SiaEz9krwDW
=tH3+
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- prevent compatibility issue between DAT and speedy join from creating
inconsistencies in the global translation table
- make sure temporary TT entries are purged out if not claimed
- fix comparison function used for TT hash table
- fix invalid stack access in batadv_dat_select_candidates()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move dsa slave dedicated code from dsa_switch_destroy to a new
dsa_slave_destroy function in slave.c.
Add the netif_carrier_off and phy_disconnect calls in order to
correctly cleanup the netdev state and PHY state machine.
Signed-off-by: Frode Isaksen <fisaksen@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon probe failure or unbinding, add missing dev_put() calls.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that we unassign the master_netdev dsa_ptr to make the packet
processing go through the regular Ethernet receive path.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since no more DSA driver uses the polling callback, and since
the phylib handles the link detection, remove the link polling
work and timer code.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Locally generated IPv4 and (probably) IPv6 packets are dropped because
skb->protocol isn't set. We could write wrappers to lwtunnel_output
for IPv4 and IPv6 that set the protocol accordingly and then call
lwtunnel_output, but mpls_output relies on the AF-specific type of dst
anyway to get the via address.
Therefore, make use of dst->dst_ops->family in mpls_output to
determine the type of nexthop and thus protocol of the packet instead
of checking skb->protocol.
Fixes: 61adedf3e3 ("route: move lwtunnel state to dst_entry")
Reported-by: Sam Russell <sam.h.russell@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NFSv4.1 callback channel is currently broken because the receive
message will keep shrinking because the backchannel receive buffer size
never gets reset.
The easiest solution to this problem is instead of changing the receive
buffer, to rather adjust the copied request.
Fixes: 38b7631fbe ("nfs4: limit callback decoding to received bytes")
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
needing to reshuffle and fix some bugs. I merged mac80211
to get the right base for some of these changes.
* new mac80211 API for upcoming driver changes: EOSP handling,
key iteration
* scan abort changes allowing to cancel an ongoing scan
* VHT IBSS 80+80 MHz support
* re-enable full AP client state tracking after fixes
* various small fixes (that weren't relevant for mac80211)
* various cleanups
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJWZVw7AAoJEGt7eEactAAdQcgP/1bOBBKgCHWZ8xhqmhLIPPUP
AgkkyBcjCbSOWyE1axm5WQZM+fQvyGAcYsnhsK7h0Wy5Jvv6goNYhxkoD3L5lAKC
LkiiqokTpLx1Em6Iugn1sdgag8q7EquYYQN+hOEOWtp32pTsx3/pDglCtGu0SX1N
eystHEAu6mzPezat99M4s80fRlfBop3yaUuL5XopQFGtU37zfUgoXJB3BoXgxNjK
XyD22jtPDreDMndZ9ugfvMaiq3iKRBhKXqgGb3SqMaStIyRK8zAkHb5jg3CllMeq
bEsz4Rb4r+vtm2AVsUMWjfd/upQKwPwuvdvCvv4AQCO+aR9Rm+tR/wnnD4Gtnek5
zPQ6XWt/0V4CKGl+W9shnDSA1DZ3hTijJlaGsK+RUqEtdq903lEP7fc2GsSvlund
jXHfOExieuZOToKWTKpmNGsCw6fjJaGXNd/iLWo5VGAZS2X+JLmFZ94g43a6zOGZ
s1Gz4F3tz4u4Bd26NAK2Z6CQRvDS4OOyLIjl9vpB9Fk/9nQx3f7WD8aBTRuCVAtG
U2sFEUscz3rkdct30Gvkjm3ovmgc4pomTDvOpmNIsSCi2ygzGWHbEvSrrHdIjzVy
KDcvRs6bRtCL/WxaaEIk46M6+6aKlSnZytPLl7vkNnvxXuEF7GYdnNVSUbSH9Nte
XzT4+rZRiqyPZEGhBekw
=+5dd
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
This pull request got a bit bigger than I wanted, due to
needing to reshuffle and fix some bugs. I merged mac80211
to get the right base for some of these changes.
* new mac80211 API for upcoming driver changes: EOSP handling,
key iteration
* scan abort changes allowing to cancel an ongoing scan
* VHT IBSS 80+80 MHz support
* re-enable full AP client state tracking after fixes
* various small fixes (that weren't relevant for mac80211)
* various cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following commit which went into mainline through networking tree
3b13758f51 ("cgroups: Allow dynamically changing net_classid")
conflicts in net/core/netclassid_cgroup.c with the following pending
fix in cgroup/for-4.4-fixes.
1f7dd3e5a6 ("cgroup: fix handling of multi-destination migration from subtree_control enabling")
The former separates out update_classid() from cgrp_attach() and
updates it to walk all fds of all tasks in the target css so that it
can be used from both migration and config change paths. The latter
drops @css from cgrp_attach().
Resolve the conflict by making cgrp_attach() call update_classid()
with the css from the first task. We can revive @tset walking in
cgrp_attach() but given that net_cls is v1 only where there always is
only one target css during migration, this is fine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nina Schiff <ninasc@fb.com>
batadv_dat_select_candidates provides an u32 to batadv_hash_dat but it
needs a batadv_dat_entry with at least ip and vid filled in.
Fixes: 3e26722bc9f2 ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Sven Eckelmann <sven@open-mesh.com>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The translation table implementation, namely batadv_compare_tt(),
is used to compare two client entries and deciding if they are the
holding the same information. Each client entry is identified by
its mac address and its VLAN id (VID).
Consequently, batadv_compare_tt() has to not only compare the mac
addresses but also the VIDs.
Without this fix adding a new client entry that possesses the same
mac address as another client but operates on a different VID will
fail because both client entries will considered identical.
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
In the case when a temporary entry is added first and a proper tt entry
is added after that, the temporary tt entry is kept in the orig list.
However the temporary flag is removed at this point, and therefore the
purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper
one.
This case can happen if a client behind a given originator moves before
the TT announcement is sent out. Other than that, this case can also be
created by bogus or malicious payload frames for VLANs which are not
existent on the sending originator.
Reported-by: Alessandro Bolletta <alessandro@mediaspot.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
DAT Cache replies are answered on behalf of other clients which are not
connected to the answering originator. Therefore, we shouldn't add these
clients to the answering originators TT table through speed join to
avoid bogus entries.
Reported-by: Alessandro Bolletta <alessandro@mediaspot.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
In case of HW ROC, when the driver reports that the ROC expired,
it is not sufficient to purge the ROCs based on the remaining
time, as it possible that the device finished the ROC session
before the actual requested duration.
To handle such cases, in case of ROC expired notification from
the driver, complete all the ROCs which are marked with hw_begun,
regardless of the remaining duration.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current unix_dgram_recvsmg code acquires the u->readlock mutex in
order to protect access to the peek offset prior to calling
__skb_recv_datagram for actually receiving data. This implies that a
blocking reader will go to sleep with this mutex held if there's
presently no data to return to userspace. Two non-desirable side effects
of this are that a later non-blocking read call on the same socket will
block on the ->readlock mutex until the earlier blocking call releases it
(or the readers is interrupted) and that later blocking read calls
will wait longer than the effective socket read timeout says they
should: The timeout will only start 'ticking' once such a reader hits
the schedule_timeout in wait_for_more_packets (core.c) while the time it
already had to wait until it could acquire the mutex is unaccounted for.
The patch avoids both by using the __skb_try_recv_datagram and
__skb_wait_for_more packets functions created by the first patch to
implement a unix_dgram_recvmsg read loop which releases the readlock
mutex prior to going to sleep and reacquires it as needed
afterwards. Non-blocking readers will thus immediately return with
-EAGAIN if there's no data available regardless of any concurrent
blocking readers and all blocking readers will end up sleeping via
schedule_timeout, thus honouring the configured socket receive timeout.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __skb_recv_datagram routine in core/ datagram.c provides a general
skb reception factility supposed to be utilized by protocol modules
providing datagram sockets. It encompasses both the actual recvmsg code
and a surrounding 'sleep until data is available' loop. This is
inconvenient if a protocol module has to use additional locking in order
to maintain some per-socket state the generic datagram socket code is
unaware of (as the af_unix code does). The patch below moves the recvmsg
proper code into a new __skb_try_recv_datagram routine which doesn't
sleep and renames wait_for_more_packets to
__skb_wait_for_more_packets, both routines being exported interfaces. The
original __skb_recv_datagram routine is reimplemented on top of these
two functions such that its user-visible behaviour remains unchanged.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING
state, if B neither claim his rwnd is 0 nor send SACK for this data, A
will keep retransmitting this data until t5 timeout, Max.Retrans times
can't work anymore, which is bad.
if B's rwnd is not 0, it should send abort after Max.Retrans times, only
when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A
will start t5 timer, which is also commit f8d9605243 ("sctp: Enforce
retransmission limit during shutdown") means, but it lacks the condition
peer rwnd == 0.
so fix it by adding a bit (zero_window_announced) in peer to record if
the last rwnd is 0. If it was, zero_window_announced will be set. and use
this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING.
Fixes: commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the chunks are enqueued successfully but sctp_cmd_interpreter()
return err to sctp_sendmsg() (mainly because of no mem), the chunks will
get re-queued, but we are dropping the reference and freeing them.
The fix is to just drop the reference on the datamsg just as it had
succeeded, as:
- if the chunks weren't queued, this is enough to get them freed.
- if they were queued, they will get freed when they finally get out or
discarded.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a msg is sent, sctp will hold the chunks of this msg and then try
to enqueue them. But if the chunks are not enqueued in sctp_outq_tail()
because of the invalid state, sctp_cmd_interpreter() may still return
success to sctp_sendmsg() after calling sctp_outq_flush(), these chunks
will become orphans and will leak.
So we fix them by moving sctp_chunk_hold() to sctp_outq_tail(), where we
are sure that the chunk is going to get queued.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we are keeping timestamps on when copying the socket, we also have to
copy sk_tsflags.
This is needed since b9f40e21ef ("net-timestamp: move timestamp flags
out of sk_flags").
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
related to disabling sock timestamp.
When SCTP accepts an association or peel one off, it copies sock flags
but forgot to call net_enable_timestamp() if a packet timestamping flag
was copied, leading to extra calls to net_disable_timestamp() whenever
such clones were closed.
The fix is to call net_enable_timestamp() whenever we copy a sock with
that flag on, like tcp does.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP echoes a cookie o INIT ACK chunks that contains a timestamp, for
detecting stale cookies. This cookie is echoed back to the server by the
client and then that timestamp is checked.
Thing is, if the listening socket is using packet timestamping, the
cookie is encoded with ktime_get() value and checked against
ktime_get_real(), as done by __net_timestamp().
The fix is to sctp also use ktime_get_real(), so we can compare bananas
with bananas later no matter if packet timestamping was enabled or not.
Fixes: 52db882f3f ("net: sctp: migrate cookie life from timeval to ktime")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Eric, these helpers should have const dev param.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 64236f3f3d ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
failed to update the setting of the IFA_F_OPTIMISTIC flag, causing
the IFA_F_STABLE_PRIVACY flag to be lost if IFA_F_OPTIMISTIC is set.
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 64236f3f3d ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A network interface can change type. It may change from a type which
batman does not support, e.g. hdlc, to one it does, e.g. hdlc-eth.
When an interface changes type, it sends two notifications. Handle
these notifications.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
An interface changing type may not have IPv6 addresses. Don't
call the address configuration type change in this case.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parameters were updated only if the kernel was unable to find the tunnel
with the new parameters, ie only if core pamareters were updated (keys,
addr, link, type).
Now it's possible to update ttl, hoplimit, flowinfo and flags.
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cancelling, you can cancel "any" (first in list) mgmt-tx
or remain-on-channel operation by using the value 0 for the
cookie along with the *opposite* operation, i.e.
* cancel the first mgmt-tx by cancelling roc with 0 cookie
* cancel the first roc by cancelling mgmt-tx with 0 cookie
This isn't really that bad since userspace should only pass
cookies that we gave it, but could lead to hard-to-debug
issues so better prevent it and reject zero values since we
never hand those out.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While it was possible to create an IBSS with 80+80 MHz channel, joining
such an IBSS resulted in falling back to 20 MHz channel with VHT
disabled due to a missing switch case for 80+80.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The same piece of code appears at two places. Make a function from it.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni found a bug in the remain-on-channel logic: when a short item
is queued, a long item is combined with it extending the original
one, and then the long item is deleted, the timeout doesn't go back
to the short one, and the short item ends up taking a long time. In
this case, this showed as blocking scan when running two test cases
back to back - the scan from the second was delayed even though all
the remain-on-channel items should long have been gone.
Fixing this with the current data structures turns out to be a bit
complicated, we just remove the long item from the dependents list
right now and don't recalculate the timeouts.
There's a somewhat similar bug where we delete the short item and
all the dependents go with it; to fix this we'd have to move them
from the dependents to the real list.
Instead of trying to do that, rewrite the code to not have all this
complexity in the data structures: use a single list and allow more
than one entry in it being marked as started. This makes the code a
bit more complex, the worker needs to understand that it might need
to just remove one of the started items, while keeping the device
off-channel, but that's not more complicated than the nested data
structures.
This then fixes both issues described, and makes it easier to also
limit the overall off-channel time when combining.
TODO: as before, with hardware remain-on-channel, deleting an item
after combining results in cancelling them all - we can keep track
of the time elapsed and only cancel after that to fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the cookie is assigned inside ieee80211_make_ack_skb()
now, we no longer need to return the ack_skb as the cookie
and can simplify the function's return and the callers. Also
rename it to ieee80211_attach_ack_skb() to more accurately
reflect its purpose.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is quite a bit of code that logically depends here since
it has to deal with all the remain-on-channel logic.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a mgmt-tx operation is aborted before it runs, the wrong
cookie is reported back to userspace, and the ack_skb gets
leaked since the frame is freed directly instead of freeing
it using ieee80211_free_txskb(). Fix that.
Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If some code stops the queues more times than having started
(for when refcounting is used), warn on and reset the counter
to 0 to avoid blocking forever.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When freeing the TX skb for an off-channel TX, use the correct
API to also free the ACK skb that might have been allocated.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a new station is added to AP/GO interfaces the default behaviour
is for it to be added authenticated and associated, due to backwards
compatibility. To prevent that, the driver must be able to do that
(setting the NL80211_FEATURE_FULL_AP_CLIENT_STATE feature flag) and
userspace must set the flag mask to auth|assoc and clear the set.
Handle this quirk in the API entirely in nl80211, and always push the
full flags to the drivers. NL80211_FEATURE_FULL_AP_CLIENT_STATE is
still required for userspace to be allowed to set the mask including
those bits, but after checking that add both flags to the mask and
set in case userspace didn't set them otherwise.
This obsoletes the mac80211 code handling this difference, no other
driver is currently using these flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix nl80211_set_station() to use the value of NL80211_ATTR_STA_AID
attribute instead of NL80211_ATTR_PEER_AID attribute.
Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This commit adds implementation for abort scan in mac80211.
Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[adjust to wdev change in previous patch and clean up code a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement new functionality for aborting an ongoing scan.
Add NL80211_CMD_ABORT_SCAN to the nl80211 interface. After
aborting the scan, driver shall provide the scan status by
calling cfg80211_scan_done().
Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[change command to take wdev instead of netdev so that it
can be used on p2p-device scans]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add new VIF flag, that will allow get NOA update
notification when driver will request this, even
this is not pure P2P vif (eg. STA vif).
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Last caller of this function was removed in 3.17 in commit
97dc94f1d9.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
add ieee80211_iter_keys_rcu() to iterate over uploaded
keys in atomic context (when rcu is locked)
The station removal code removes the keys only after
calling synchronize_net(), so it's not safe to iterate
the keys at this point (and postponing the actual key
deletion with call_rcu() might result in some
badly-ordered ops calls).
Add a flag to indicate a station is being removed,
and skip the configured keys if it's set.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can happen when the driver needs to send less frames
than expected and then needs to close the SP.
Mac80211 still needs to set the more_data properly based
on its buffer state (ps_tx_buffer and buffered frames on
other TIDs).
To that end, refactor the code that delivers frames upon
uAPSD trigger frames to be able to get only the more_data
bit without actually delivering those frames in case the
driver is just asking to set a NDP with EOSP and MORE_DATA
bit properly set.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This really should never happen except very early in the process
of bringing up a new driver, at which point you'll have to add
more debugging in the driver and this string isn't useful. Remove
it and save some size (when it's even compiled in.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This indicates a driver key selection issue, but even then there's
no point in printing it all the time, so ratelimit it. Also remove
the priv pointer from it -- people debugging will only have a single
device anyway and it's useless as anything but a cookie.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no point in printing the mpath pointer since it can't
be used for anything - print the MAC address instead (like in
the forwarding case.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function is a very simple wrapper around another one,
just adds a few default parameters, so replace it with a
static inline instead of using EXPORT_SYMBOL, reducing
the module size slightly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Complete the tracepoint with the missing data - it's not printed
by default (a lot of it is dynamic arrays) but will be recorded
and be available during post-processing.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices or drivers cannot deal with having the same station
address for different virtual interfaces, say as a client to two
virtual AP interfaces. Rather than requiring each driver with a
limitation like that to enforce it, add a hardware flag for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
I want to get the full off-channel bugfix since later code depends on
it, as well as the AP client state change so I can revert it correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.c
All three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:
1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
2) IDs for some new iwlwifi chips, from Oren Givon.
3) Fix rtlwifi lockups on boot, from Larry Finger.
4) Fix memory leak in fm10k, from Stephen Hemminger.
5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.
6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.
8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.
9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.
10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.
11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.
12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.
13) Fix double free in VRF code, from Nikolay Aleksandrov.
14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.
17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.
18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.
19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.
20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.
21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.
22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.
23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.
24) Don't actively scan radar channels, from Antonio Quartulli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJWX2JpAAoJEGt7eEactAAd+9cQAJmn3zt0orj/sASv7BeF0h5d
sRfAhkBVOTZur8MgVj1c7fNzT3h1HYNei6c4SA2+rphy6Vbifoli1nLNloC+1Ld4
2WXllEqVe473GqVofCxZHsYZPr2Inmhj7uMiDqvoKUiSRz7phmkY9m+Vju6WZG/W
F6FrTLqFS7UDHIYNYH1DNVSScd/89Gu6pHZvEpoHkrsvt5rEEZiPAQ7sDB4MAMSm
amETtuBqgX83gHR2G4UT2Z9r8TVdzhO+s7vvdVjj0qbP6C6BaS9IUXDjmm3gOvHy
G7j9MJuUC8w/2fZ5A5/l94OuN5rF/ZFMNkn2e6OIzg0HjEZh74CeLl21CnuxdNpB
ECmDVbKoI3OVoFbhEl7P5fBokzZsqhAXpZOmbYEeFRyO6lF2Mv9uzttsF6EOmCX0
BjIoEXOWA2o6IUD/8M6NjW+/B58SDDVi9Mg6D+7Dn7rUFlQ4pddjb0m94bI8GQQU
wl7gROMvYR3tIhiMs1bLF9jJgA831WGWu9eiq8mT2kHPaEV2bFO7OK+SUxyZu1M7
UhN4eoLpU84v9QNJ34N8RCiYxEZ1e6HQxBwQn/fDIWOjOHryZoArhicFY9aOEja4
9xBI9OJhBWOL4N4AFdmTExBdYudSgCTpX+/gQ4tSfedz3lqF79y8+PILwv6E1Q6D
8pScH/4pVo4v5omGaMpA
=vui7
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5405ff6e15 ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.
This commit fixes this bug.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Register PF_PPPOX with pppox module rather than with pppoe,
so that pppoe doesn't get loaded for any PF_PPPOX socket.
* Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO()
instead of using pppox's own naming scheme.
* While there, add auto-loading feature for pptp.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable virtio-vsock and vhost-vsock.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VM sockets virtio transport implementation. This module runs in guest
kernel.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.
Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).
dead routes:
-----------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link set dev swp1 down
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
linkdown routes:
----------------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
/* carrier goes down */
$ip link show dev swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.
One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[ 681.774821] PAX: sch->q.qlen: 0 n: 1
[ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1
[ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[ 681.774962] Call Trace:
[ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40
[ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100
[ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.
Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.
Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a53 ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a53 "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2015-12-01
Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
the Security Manager L2CAP channel that'll happen for every LE
connection.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When lower device like bonding slave, team/bridge port, etc changes its
state, it is useful for others to notice this change. Currently this is
implemented specificly for bonding as NETDEV_BONDING_INFO notifier. This
patch aims to replace this specific usage and make this more generic to
be used for all upper-lower devices.
Introduce NETDEV_CHANGELOWERSTATE netdev notifier type and
netdev_lower_state_changed() helper.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev drivers reflect the newly requested topology to hardware when
CHANGEUPPER is received, after software links were already formed.
However, the operation can fail and user will not be notified, as the
return value of the notifier is not checked.
Add this check and rollback software links if necessary.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.
ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.
__ip6_dst_store() and ip6_dst_store() share same implementation.
sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()
Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
This is because ip6_dst_store() can be called from process context,
without any lock held.
As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()
Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch completes the work I did in commit 45f6fad84c
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.
This simply makes sure np->opt is used with proper RCU locking
and accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the following v2 hierarchy.
P0 (+memory) --- P1 (-memory) --- A
\- B
P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.
The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.
WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[<ffffffff81551ffc>] dump_stack+0x4e/0x82
[<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
[<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
[<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
[<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
[<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
[<ffffffff81189016>] cgroup_attach_task+0x176/0x200
[<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
[<ffffffff81189684>] cgroup_procs_write+0x14/0x20
[<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
[<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
[<ffffffff81265f88>] __vfs_write+0x28/0xe0
[<ffffffff812666fc>] vfs_write+0xac/0x1a0
[<ffffffff81267019>] SyS_write+0x49/0xb0
[<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.
* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.
* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.
* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.
* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Proxy entries could have null pointer to net-device.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
If tcp_send_ack() can not allocate skb, we properly handle this
and setup a timer to try later.
Use __GFP_NOWARN to avoid polluting syslog in the case host is
under memory pressure, so that pertinent messages are not lost under
a flood of useless information.
sk_gfp_atomic() can use its gfp_mask argument (all callers currently
were using GFP_ATOMIC before this patch)
We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this
function now takes into account its second argument (gfp_mask)
Note that when tcp_transmit_skb() is called with clone_it set to false,
we do not attempt memory allocations, so can pass a 0 gfp_mask, which
most compilers can emit faster than a non zero or constant value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.
This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the last change here, I neglected to update the cookie in one code
path: when a mgmt-tx has no real cookie sent to userspace as it doesn't
wait for a response, but is off-channel. The original code used the SKB
pointer as the cookie and always assigned the cookie to the TX SKB in
ieee80211_start_roc_work(), but my change turned this around and made
the code rely on a valid cookie being passed in.
Unfortunately, the off-channel no-wait TX path wasn't assigning one at
all, resulting in an uninitialized stack value being used. This wasn't
handed back to userspace as a cookie (since in the no-wait case there
isn't a cookie), but it was tested for non-zero to distinguish between
mgmt-tx and off-channel.
Fix this by assigning a dummy non-zero cookie unconditionally, and get
rid of a misleading comment and some dead code while at it. I'll clean
up the ACK SKB handling separately later.
Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
DFS channels should not be actively scanned as we can't be sure
if we are allowed or not.
If the current channel is in the DFS band, active scan might be
performed after CSA, but we have no guarantee about other channels,
therefore it is safer to prevent active scanning at all.
Cc: stable@vger.kernel.org
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Interfaces are being initialized (setup) on addition,
and torn down on removal.
However, p2p device is being torn down when stopped,
resulting in the next p2p start operation being done
on uninitialized interface.
Solve it by calling ieee80211_teardown_sdata() only
on interface removal (for the non-netdev case).
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[squashed in fix to call teardown after unregister]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After 614732eaa1, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.
v1 -> v2:
- move vport 'owner' initialization in ovs_vport_ops_register()
and make such function a macro
Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
triggering a fault in sock_wake_async() when async IO is requested.
Said program stressed af_unix sockets, but the issue is generic
and should be addressed in core networking stack.
The problem is that by the time sock_wake_async() is called,
we should not access the @flags field of 'struct socket',
as the inode containing this socket might be freed without
further notice, and without RCU grace period.
We already maintain an RCU protected structure, "struct socket_wq"
so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
is the safe route.
It also reduces number of cache lines needing dirtying, so might
provide a performance improvement anyway.
In followup patches, we might move remaining flags (SOCK_NOSPACE,
SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
being mostly read and let it being shared between cpus.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup to make following patch easier to
review.
Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()
To ease backports, we rename both constants.
Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit ab450605b3.
In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.
This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.
CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current unix_dgram_recvmsg does a wake up for every received
datagram. This seems wasteful as only SOCK_DGRAM client sockets in an
n:1 association with a server socket will ever wait because of the
associated condition. The patch below changes the function such that the
wake up only happens if wq_has_sleeper indicates that someone actually
wants to be notified. Testing with SOCK_SEQPACKET and SOCK_DGRAM socket
seems to confirm that this is an improvment.
Signed-Off-By: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :
WARN_ON(tp->copied_seq != tp->rcv_nxt &&
!(flags & (MSG_PEEK | MSG_TRUNC)));
His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.
Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to add and remove MFC entries. It uses the
same attributes like the already present dump support in order to be
consistent. There's one new entry - RTA_PREFSRC, it's used to denote an
MFC_PROXY entry (see MRT_ADD_MFC vs MRT_ADD_MFC_PROXY).
The already existing infrastructure is used to create and delete the
entries, the netlink message gets converted internally to a struct mfcctl
which is used with ipmr_mfc_add/delete.
The other used attributes are:
RTA_IIF - used for mfcc_parent (when adding it's required to be valid)
RTA_SRC - used for mfcc_origin
RTA_DST - used for mfcc_mcastgrp
RTA_TABLE - the MRT table id
RTA_MULTIPATH - the "oifs" ttl array
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can have both errors and we'll return the second one, fix it to
return an error at a time as it's normal. I've overlooked this in my
previous set.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the inline pimsm_enabled() to pim.h and rename it to
ipmr_pimsm_enabled to show it's for the ipv4 ipmr code since pim.h is
used by IPv6 too.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the definitions of VIF_EXISTS() and struct mr_table to mroute.h
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MFC_NOTIFY was introduced in kernel 2.1.68 but afaik it hasn't been used
and I couldn't find any users currently so just remove it. Only
MFC_STATIC is left, so move it into an enum, add a description and use
BIT().
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like many files are including mroute.h unnecessarily, so remove
the include. Most importantly remove it from ipv6.
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.
Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory barrier in the helper wq_has_sleeper is needed by just
about every user of waitqueue_active. This patch generalises it
by making it take a wait_queue_head_t directly. The existing
helper is renamed to skwq_has_sleeper.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9c7077622d ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.
This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24 <= 24)"
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for mangling packet payload. Checksum for the specified base
header is updated automatically if requested, however no updates for any
kind of pseudo headers are supported, meaning no stateless NAT is supported.
For checksum updates different checksumming methods can be specified. The
currently supported methods are NONE for no checksum updates, and INET for
internet type checksums.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Various files are owned by root with 0440 permission. Reading them is
impossible in an unprivileged user namespace, interfering with firewall
tools. For instance, iptables-save relies on /proc/net/ip_tables_names
contents to dump only loaded tables.
This patch assigned ownership of the following files to root in the
current namespace:
- /proc/net/*_tables_names
- /proc/net/*_tables_matches
- /proc/net/*_tables_targets
- /proc/net/nf_conntrack
- /proc/net/nf_conntrack_expect
- /proc/net/netfilter/nfnetlink_log
A mapping for root must be available, so this order should be followed:
unshare(CLONE_NEWUSER);
/* Setup the mapping */
unshare(CLONE_NEWNET);
Signed-off-by: Philip Whineray <phil@firehol.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb085 ("RDS: verify the underlying transport exists before creating a connection")
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During pre-upstream development, the openvswitch datapath used a custom
hashtable to store vports that could fail on delete due to lack of
memory. However, prior to upstream submission, this code was reworked to
use an hlist based hastable with flexible-array based buckets. As such
the failure condition was eliminated from the vport_del path, rendering
this comment invalid.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since (at least) commit b17a7c179d ("[NET]: Do sysfs registration as
part of register_netdevice."), netdev_run_todo() deals only with
unregistration, so we don't need to do the rtnl_unlock/lock cycle to
finish registration when failing pimreg or dvmrp device creation. In
fact that opens a race condition where someone can delete the device
while rtnl is unlocked because it's fully registered. The problem gets
worse when netlink support is introduced as there are more points of entry
that can cause it and it also makes reusing that code correctly impossible.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, the transmit phase of a client call is implicitly ack'd by the
reception of the first data packet of the response being received.
However, if a security negotiation happens, the transmit phase, if it is
entirely contained in a single packet, may get an ack packet in response
and then may get aborted due to security negotiation failure.
Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
to having transmitted all the data, the code that handles processing of the
received ack packet doesn't note the hard ack the data packet.
The following abort packet in the case of security negotiation failure then
incurs an assertion failure when it tries to drain the Tx queue because the
hard ack state is out of sync (hard ack means the packets have been
processed and can be discarded by the sender; a soft ack means that the
packets are received but could still be discarded and rerequested by the
receiver).
To fix this, we should record the hard ack we received for the ack packet.
The assertion failure looks like:
RxRPC: Assertion failed
1 <= 0 is false
0x1 <= 0x0 is false
------------[ cut here ]------------
kernel BUG at ../net/rxrpc/ar-ack.c:431!
...
RIP: 0010:[<ffffffffa006857b>] [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
...
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.
To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.
Note: similar patch has been already submitted by Yoshifuji Hideaki in
http://patchwork.ozlabs.org/patch/220979/
but got lost and forgotten for some reason.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-11-23
Here's the first bluetooth-next pull request for the 4.5 kernel.
- Add new Get Advertising Size Information management command
- Add support for new system note message type on monitor channel
- Refactor LE scan changes behind separate workqueue to avoid races
- Fix issue with privacy feature when powering on adapter
- Various minor fixes & cleanups here and there
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The principal name on a gss cred is used to setup the NFSv4.0 callback,
which has to have a client principal name to authenticate to.
That code wants the name to be in the form servicetype@hostname.
rpc.svcgssd passes down such names (and passes down no principal name at
all in the case the principal isn't a service principal).
gss-proxy always passes down the principal name, and passes it down in
the form servicetype/hostname@REALM. So we've been munging the name
gss-proxy passes down into the format the NFSv4.0 callback code expects,
or throwing away the name if we can't.
Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've
also been using the principal name to verify that certain operations are
done as the same principal as was used on the original EXCHANGE_ID call.
For that application, the original name passed down by gss-proxy is also
useful.
Lack of that name in some cases was causing some kerberized NFSv4.1
mount failures in an Active Directory environment.
This fix only works in the gss-proxy case. The fix for legacy
rpc.svcgssd would be more involved, and rpc.svcgssd already has other
problems in the AD case.
Reported-and-tested-by: James Ralston <ralston@pobox.com>
Acked-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit 09605cc12c ("net ipv4: use preferred log methods") replaced
a few calls of pr_cont() after a console print without a trailing
newline by pr_info(), causing lines to be split during IP
autoconfiguration, like:
.
,
OK
IP-Config: Got DHCP answer from 192.168.97.254,
my address is 192.168.97.44
Convert these back to using pr_cont(), so it prints again:
., OK
IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.44
Absorb the printing of "my address ..." into the previous call to
pr_info(), as there's no reason to use a continuation there.
Convert one more pr_info() to print nameservers while we're at it.
Fixes: 09605cc12c ("net ipv4: use preferred log methods")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the introduction of the switch gpio reset API, I'm getting
build errors in configurations that disable CONFIG_GPIOLIB:
net/dsa/dsa.c:783:16: error: implicit declaration of function 'gpio_to_desc' [-Werror=implicit-function-declaration]
The reason is that linux/gpio/consumer.h is not automatically
included without gpiolib support. This adds an explicit #include
statement to make it compile in all configurations. The reset
functionality will not work without gpiolib, which is what you
get when disabling the feature.
As far as I can tell, gpiolib is supported on all architectures
on which you can have DSA at the moment.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cc30c16344 ("net: dsa: Add support for a switch reset gpio")
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coverity says:
*** CID 1338065: Error handling issues (CHECKED_RETURN)
/net/tipc/udp_media.c: 162 in tipc_udp_send_msg()
156 struct udp_media_addr *dst = (struct udp_media_addr *)&dest->value;
157 struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value;
158 struct sk_buff *clone;
159 struct rtable *rt;
160
161 if (skb_headroom(skb) < UDP_MIN_HEADROOM)
>>> CID 1338065: Error handling issues (CHECKED_RETURN)
>>> Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56 times).
162 pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
163
164 clone = skb_clone(skb, GFP_ATOMIC);
165 skb_set_inner_protocol(clone, htons(ETH_P_TIPC));
166 ub = rcu_dereference_rtnl(b->media_ptr);
167 if (!ub) {
When expanding buffer headroom over udp tunnel with pskb_expand_head(),
it's unfortunate that we don't check its return value. As a result, if
the function returns an error code due to the lack of memory, it may
cause unpredictable consequence as we unconditionally consider that
it's always successful.
Fixes: e53567948f ("tipc: conditionally expand buffer headroom over udp tunnel")
Reported-by: <scan-admin@coverity.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if we drain receive queue thoroughly in tipc_release() after tipc
socket is removed from rhashtable, it is possible that some packets
are in flight because some CPU runs receiver and did rhashtable lookup
before we removed socket. They will achieve receive queue, but nobody
delete them at all. To avoid this leak, we register a private socket
destructor to purge receive queue, meaning releasing packets pending
on receive queue will be delayed until the last reference of tipc
socket will be released.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field. That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().
Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.
Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The vmci_transport_notify_ops structures are never modified, so declare
them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The in_cache_ops and eg_cache_ops structures are never modified, so declare
them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out common vif init code used in both tunnel and pimreg
initialization and create ipmr_init_vif_indev() function.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take rtnl in the beginning unconditionally as most options already need
it (one exception - MRT_DONE, see the comment inside), make the
lock/unlock places central and move out the switch() local variables.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not necessary to check for null as SLAB_PANIC is used and we'll
panic if the alloc fails, so just drop it.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial replace of ifdef with IS_BUILTIN().
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a switch to determine if optname is correct and set val accordingly.
This produces a much more straight-forward and readable code.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial code and comment style fixes, also removed some extra newlines,
spaces and tabs.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the helper pimsm_enabled() which replaces the old CONFIG_IP_PIMSM
define and is used to check if any version of PIM-SM has been enabled.
Use a single if defined(CONFIG_IP_PIMSM_V1) || defined(CONFIG_IP_PIMSM_V2)
for the pim-sm shared code. This is okay w.r.t IGMPMSG_WHOLEPKT because
only a VIFF_REGISTER device can send such packet, and it can't be
created if pimsm_enabled() is false.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before mroute_reg_vif_num was defined only if any of the CONFIG_PIMSM_
options were set, but that's not really necessary as the size of the
struct is the same in both cases (checked with pahole, both cases size
is 3256 bytes) and we can remove some unnecessary ifdefs to simplify the
code.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the table id check in ipmr_new_table and make it return error
pointer. We need this change for the upcoming netlink table manipulation
support in order to avoid code duplication and a race condition.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WARN_ON_ONCE() takes a condition, it doesn't take an error message. I
have converted this to WARN() instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.
The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.
The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.
The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.
That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f94 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classid of a process is changed either when a process is moved to
or from a cgroup or when the net_cls.classid file is updated.
Previously net_cls only supported propogating these changes to the
cgroup's related sockets when a process was added or removed from the
cgroup. This means it was neccessary to remove and re-add all processes
to a cgroup in order to update its classid. This change introduces
support for doing this dynamically - i.e. when the value is changed in
the net_cls_classid file, this will also trigger an update to the
classid associated with all sockets controlled by the cgroup.
This mimics the behaviour of other cgroup subsystems.
net_prio circumvents this issue by storing an index into a table with
each socket (and so any updates to the table, don't require updating
the value associated with the socket). net_cls, however, passes the
socket the classid directly, and so this additional step is needed.
Signed-off-by: Nina Schiff <ninasc@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
with the previous one.
This means that the return value is always NULL or the skb argument.
So change it to an err value.
Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
we can now just return NF_ACCEPT to move on to the next hook except for
-EINPROGRESS (which means skb has been queued for reassembly), in which case we
return NF_STOLEN.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit 6aafeef03b
("netfilter: push reasm skb through instead of original frag skbs")
changed ipv6 defrag to not use the original skbs anymore.
So rather than keeping the original skbs around just to discard them
afterwards just use the original skbs directly for the fraglist of
the newly assembled skb and remove the extra clone/free operations.
The skb that completes the fragment queue is morphed into a the
reassembled one instead, just like ipv4 defrag.
openvswitch doesn't need any additional skb_morph magic anymore to deal
with this situation so just remove that.
A followup patch can then also remove the NF_HOOK (re)invocation in
the ipv6 netfilter defrag hook.
Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eliminate some checkpatch issues by improved layout of if statements.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change brace placement to eliminate checkpatch error.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Changes comments to use netdev style.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes a bunch of issues detected by checkpatch with regards to code
indentation.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some boards have a gpio line tied to the switch reset pin. Allow this
gpio to be retrieved from the device tree, and take the switch out of
reset before performing the probe.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch increments the management interface revision due to
introduction of a new Get Advertising Size Information command and
various other fixes & improvements.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In order to enable advertising with privacy enabled, SMP has to be
registered in order to generate new RPA. During power on, it will be
registered at the very end which is the reason why advertising is not
enabled and it's not possible to enable it anymore due to mismatch
between hci_dev settings and actual controller state.
This fixes this problem by moving SMP registration earlier, just after
controller is powered (which is ok, because LE SMP will be already able
to decide on identity address to be used), but before advertising is
enabled.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There were a couple of code paths missed by the previous patch that
added a HCI status return parameter to __hci_req_sync. This patch adds
the missing assignments for them.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Several ARM default configurations give us warnings on recent
compilers about potentially uninitialized variables in the
nfnetlink code in two functions:
net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message':
net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)
Moving the rcu_dereference(nfnl_ct_hook) call outside of the
conditional code avoids the warning without forcing us to
preinitialize the variable.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a4b4766c3c ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.
Fixes: 8229efdaef ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0xffff880034c144c0 (size 192):
comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
hex dump (first 32 bytes):
98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff .S.4.....S.4....
ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00 ................
backtrace:
[<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
[<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
[<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
[<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
[<ffffffff81564e13>] raw_setsockopt+0x33/0x90
[<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
[<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
[<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
Make sure that everything is cleaned on netns destruction.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David and HacKurx reported a following/similar size overflow triggered
in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:
(Already fixed in later grsecurity versions by Brad and PaX Team.)
[ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
[ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
[ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
[ 1002.296153] ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
[ 1002.296162] ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
[ 1002.296169] ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
[ 1002.296176] Call Trace:
[ 1002.296190] [<ffffffff818129ba>] dump_stack+0x45/0x57
[ 1002.296200] [<ffffffff8121f838>] report_size_overflow+0x38/0x60
[ 1002.296209] [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
[ 1002.296220] [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
[ 1002.296228] [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
[ 1002.296236] [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
[ 1002.296243] [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
[ 1002.296248] [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
[ 1002.296257] [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
[ 1002.296263] [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
[ 1002.296271] [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85
Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.
In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.
When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():
int cmlen = CMSG_LEN(i * sizeof(int)); <--- cmlen w/o tail-padding
err = put_user(SOL_SOCKET, &cm->cmsg_level);
if (!err)
err = put_user(SCM_RIGHTS, &cm->cmsg_type);
if (!err)
err = put_user(cmlen, &cm->cmsg_len);
if (!err) {
cmlen = CMSG_SPACE(i * sizeof(int)); <--- cmlen w/ 4 byte extra tail-padding
msg->msg_control += cmlen;
msg->msg_controllen -= cmlen; <--- iff no tail-padding space here ...
} ... wrap-around
F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.
In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.
In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.
Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!
Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).
Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad2
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").
RFC3542, section 20.2. says:
The fields shown as "XX" are possible padding, between the cmsghdr
structure and the data, and between the data and the next cmsghdr
structure, if required by the implementation. While sending an
application may or may not include padding at the end of last
ancillary data in msg_controllen and implementations must accept both
as valid. On receiving a portable application must provide space for
padding at the end of the last ancillary data as implementations may
copy out the padding at the end of the control message buffer and
include it in the received msg_controllen. When recvmsg() is called
if msg_controllen is too small for all the ancillary data items
including any trailing padding after the last item an implementation
may set MSG_CTRUNC.
Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad2 to avoid an overflow.
Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).
No Fixes tag this time as pre 2002 (that is, pre history tree).
Reported-by: David Sterba <dave@jikos.cz>
Reported-by: HacKurx <hackurx@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tracepoint to show fib6 table lookups and result.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Get Advertising Size Information command allows to retrieve size
information for advertising data and scan response data fields depending
on the selected flags. This is useful if applications want to know the
available size ahead of time.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The if statements for checking the flags parameter could be written a
bit easier to read. This changes this. No functional behavior has been
changed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The instance range check for Add Advertising command is missing. If the
provided instance is out of range an Invalid Parameters error should be
returned. At the moment, the generic Failed error is returned. This
extra check ensures that clear error messages are returned.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
napi_alloc_skb() can return NULL.
We should not crash should this happen.
Fixes: 93f93a4404 ("net: move skb_mark_napi_id() into core networking stack")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 5266698661 ("tipc: let broadcast packet
reception use new link receive function") the broadcast send
link state was meant to always be set to LINK_ESTABLISHED, since
we don't need this link to follow the regular link FSM rules. It
was also the intention that this state anyway shouldn't impact
the run-time working state of the link, since the latter in
reality is controlled by the number of registered peers.
We have now discovered that this assumption is not quite correct.
If the broadcast link is reset because of too many retransmissions,
its state will inadvertently go to LINK_RESETTING, and never go
back to LINK_ESTABLISHED, because the LINK_FAILURE event was not
anticipated. This will work well once, but if it happens a second
time, the reset on a link in LINK_RESETTING has has no effect, and
neither the broadcast link nor the unicast links will go down as
they should.
Furthermore, it is confusing that the management tool shows that
this link is in UP state when that obviously isn't the case.
We now ensure that this state strictly follows the true working
state of the link. The state is set to LINK_ESTABLISHED when
the number of peers is non-zero, and to LINK_RESET otherwise.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
has been significantly reduced over the last couple of years.
We now root out the last traces of this practice.
There are no functional changes in this commit.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move the definition of struct tipc_link from link.h to link.c in
order to minimize its exposure to the rest of the code.
When needed, we define new functions to make it possible for external
entities to access and set data in the link.
Apart from the above, there are no functional changes.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.
In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.
There are no functional changes in this commit.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the node FSM a node in state SELF_UP_PEER_UP cannot
change state inside a lock context, except when a TUNNEL_PROTOCOL
(SYNCH or FAILOVER) packet arrives. However, the node's individual
links may still change state.
Since each link now is protected by its own spinlock, we finally have
the conditions in place to convert the node spinlock to an rwlock_t.
If the node state and arriving packet type are rigth, we can let the
link directly receive the packet under protection of its own spinlock
and the node lock in read mode. In all other cases we use the node
lock in write mode. This enables full concurrent execution between
parallel links during steady-state traffic situations, i.e., 99+ %
of the time.
This commit implements this change.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation to allow parallel links to work more independently
from each other we introduce a per-link spinlock, to be stored in the
struct nodes's link entry area. Since the node lock still is a regular
spinlock there is no increase in parallellism at this stage.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_node_check_state() contains the core logics
for handling link synchronization and failover. For this reason,
it is important to keep it as comprehensible as possible.
In this commit, we make three small cleanups.
1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
packet confirms that the peer has lost contact, there will be no
further action in this function. To make this clearer, we return
from the function directly after the state change.
2) Since commit 0f8b8e28fb ("tipc: eliminate risk of stalled
link synchronization") only the logically first TUNNEL_PROTO/SYNCH
packet can alter the link state and set the synch point,
independently of arrival order. Hence, there is not any longer any
need to adjust the synch value in case such packets arrive in
disorder. We remove this adjustment.
3) It is the intention that any message arriving on any of the links
may trig a check for and possible termination of a node SYNCH state.
A redundant and unnoticed check for tipc_link_is_synching() obviously
beats this purpose, with the effect that only packets arriving on the
synching link may currently end the synch state. We remove this check.
This change will further shorten the synchronization period between
parallel links.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 5cbb28a4bf ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.
We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_send_rcvq() is used for re-injecting data into tcp receive queue.
Problems :
- No check against size is performed, allowed user to fool kernel in
attempting very large memory allocations, eventually triggering
OOM when memory is fragmented.
- In case of fault during the copy we do not return correct errno.
Lets use alloc_skb_with_frags() to cook optimal skbs.
Fixes: 292e8d8c85 ("tcp: Move rcvq sending to tcp_input.c")
Fixes: c0e88ff0f2 ("tcp: Repair socket queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incrementing TCPFastOpenActiveFailed snmp stats multiple times
when the handshake experiences multiple SYN timeouts.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some middle-boxes black-hole the data after the Fast Open handshake
(https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
The exact reason is unknown. The work-around is to disable Fast Open
temporarily after multiple recurring timeouts with few or no data
delivered in the established state.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Advertising reordering window in ADDBA less than 64 can crash some APs,
an example is LinkSys WRT120N (with FW v1.0.07 build 002 Jun 18 2012).
On the other hand, a driver may need to limit Tx A-MPDU size for its own
reasons, like specific HW limitations.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove unneeded variable used to store return value.
Error reported by coccicheck.
Signed-off-by: Prasanna Karthik <mkarthi3@visteon.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Fix errors reported by checkpatch.
- ERROR: spaces required around that ':' (ctx:VxW)
- ERROR: open brace '{' following function declarations go on the next line
Signed-off-by: Prasanna Karthik <mkarthi3@visteon.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The hci_req_sync_cancel() is just as much related to the request
cleanup as hci_request_cancel_all() is. Just move the former into the
latter and do the cleanup from a single place in hci_dev_do_close().
The important thing is to avoid deadlocks by holding the req_sync
lock: previously hci_request_cancel_all was done right after releasing
the lock and with this patch it's right before taking it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The conn_unfinished variable makes the entire logic of
hci_connect_le() rather confusing. By restructuring and clarifying the
logic we can actually remove the conn_unfinished variable and still
keep the same behavior.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This helps simplify the logic in further patches (less cleanups to do
in this failure branch).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The hci_connect_le_scan() is (as the name implies) a master/central
role API, so it makes no sense in passing a role parameter to it. At
the same time this patch also fixes the direct advertising support for
LE L2CAP sockets where we now call the more appropriate hci_le_connect()
API if slave/peripheral role is desired.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The only user of this, le_scan_restart_work(), is so short and simple
that it makes sense to just merge the code there.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Merge le_scan_disable_work_complete into the main le_scan_disable_work
function and take advantage of the updated bredr_inquiry() to run the
Inquiry through hci_req_sync().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Passing the needed inquiry length to bredr_inquiry() makes it possible
to also use this helper for interleaved discovery where the controller
doesn't support simultaneous Inquiry & LE scan.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The recent changes to remove dependency on HCI in Add Device missed
out relevant changes for BR/EDR. This patch removes the left-overs and
ensures the right HCI command gets queued for BR/EDR.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since discovery also deals with LE scanning it makes sense to move it
behind the same req_workqueue as other LE scanning changes. This also
simplifies the logic since we do many of the actions in a synchronous
manner.
Part of this refactoring is moving hci_req_stop_discovery() to
hci_request.c. At the same time the function receives support for
properly handling the STOPPING state since that's the state we'll be
in when stopping through the req_workqueue.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since discovery also deals with LE scanning it makes sense to move it
behind the same req_workqueue as other LE scanning changes. This also
simplifies the logic since we do many of the actions in a synchronous
manner.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In some circumstances it may be useful to abort the request through
checks done in the request callback. To make the feature possible this
patch changes the return value of the request callback from void to
int and aborts the request if a non-zero value is returned.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As preparation for moving the discovery HCI commands behind
req_workqueue, add a helper and do the validity checks of the given
discovery type before proceeding further. This way we don't need to do
them again in hci_request.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To avoid any risks of races, place also these LE scan modification
work callbacks behind the same work queue as the other LE scan
changes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are no more external users so this API can be made private.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We can easily use the new req_workqueue based background scan update
for the power on case. This also removes the last external user of
__hci_update_background_scan().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since explicit connect requests are also a sub-category of passive
scan updates, run them through the same workqueue as the other passive
scan changes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In some cases it may be important to get the exact HCI status rather
than the converted HCI-to-errno value. Add an optional return
parameter to the hci_req_sync() API to allow for this. Since there are
no good HCI translation candidates for cancelation and timeout, use
the "unknown" status code for those cases.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no point in waiting for HCI activity in Add/Remove Device
since the effects of these calls are long-lasting and we can anyway
not report up to the application all HCI failures.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of firing off a simple async request queue all background scan
updates through req_workqueue and use hci_req_sync() there to ensure
that no two updates overlap with each other.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Prepare hci_request.c to have code for doing synchronous HCI requests,
such as LE scanning or advertising changes. The necessary work
callbacks will be set up in hci_request_setup() and cleaned up in
hci_request_cancel_all(). The former is used when an HCI device get
registered, and the latter each time it is powered off (or
unregistered).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To make it clear which HCI request APIs target specifically
synchronous requests, add 'sync' to the API names.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_request.c is a more natural place for the synchronous request
handling. Furthermore, we will soon need access to some of the
previously private-to-hci_core.c functions from hci_request.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The hci_conn_params_clear_all() function is only called from
hci_unregister_dev() at which point it's completely futile to try to
do any LE scanning updates. Simply remove this unnecessary function
call. At the same time we can make the function static since it's only
accessed from within the same c-file.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To enable controller specific logging, the userspace daemon has to have
the ability to log per controller. To facilitate this support, provide
a dedicated logging channel. Messages in this channel will be included
in the monitor queue and with that also forwarded to monitoring tools
along with the actual hardware traces.
All messages from the logging channel are timestamped and with that
allow an easy correlation between userspace messages and hardware
events. This will increase the ability to debug problems faster.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The monitor channel can be used to send generic system notes as text
strings for debugging purposes. This adds the system note monitor code
and uses it for including kernel and subsystem version into traces.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The HCI sockets code has still some old casting coding style. Fix this
to match with the rest of the code.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
It's not obvious why schedule_work is used instead of queue_work. Add
a comment explaining why.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When HCI commands are injected via the raw socket, the core was not
including the decoded opcode value. So ensure that it is actually set.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
We can reduce the size of the hci_ctrl struct by converting
'bool req_start' to 'u8 req_flags' and making the two function
pointers a union (since only one is ever set at a time).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The new hci_skb_pkt_* wrappers only help if they are used consistently
in the Bluetooth subsystem. So first convert the core packet handling.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
For the LE only controllers, there are events that should not be enabled
if the corresponding command is not supported.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When setting the event mask, the HCI_QUIRK_FIXUP_INQUIRY_MODE quirk is
required to be checked so that the Inquiry Result with RSSI event gets
actually enabled.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The LE event mask should be created based on the commands that are
actually supported by the controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There are some BR/EDR default events for Bluetooth 1.2 or later
controllers that are not conditional on their features being present.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)
But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :
[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952] #2: (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_
[ 8451.091215] Call Trace:
[ 8451.091216] <IRQ> [<ffffffff856fb29c>] dump_stack+0x55/0x76
[ 8451.091229] [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235] [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239] [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242] [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246] [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
[ 8451.091249] [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253] [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256] [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260] [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263] [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267] [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
[ 8451.091270] [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
[ 8451.091273] [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
[ 8451.091277] [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90
Fixes: a8afca0329 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes the definition of the pointer _expiry from time_t to
time64_t. This is to handle the Y2038 problem where time_t
will overflow in the year 2038. The change is safe because
the kernel subsystems that call dns_query pass NULL.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Aya Mahfouz <mahfouz.saif.elyazal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the commit cdf3464e6c ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
introduced percpu storage for ip6_tunnel dst cache, but while clearing
such cache it used raw_cpu_ptr to walk the per cpu entries, so cached
dst on non current cpu are not actually reset.
This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning
such storage.
Fixes: cdf3464e6c ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()
This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.
Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.
Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_hash_del() will soon be used from both drivers (if they want)
or core networking stack.
Callers are responsibles to ensure an RCU grace period is respected
before freeing napi structure : napi_hash_del() can signal if
this RCU grace period is needed or not.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not often add/delete a napi context.
Moving napi_hash[] into read mostly section avoids potential false sharing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_tx_napi_add() is a variant of netif_napi_add()
It should be used by drivers that use a napi structure
to exclusively poll TX.
We do not want to add this kind of napi in napi_hash[] in following
patches, adding generic busy polling to all NAPI drivers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.
skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().
Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.
Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.
ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.
We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.
Before this is done, a driver still needs to call :
- skb_mark_napi_id() for each skb provided to the stack.
- napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
- Make sure RCU grace period is respected after napi_hash_del() before
memory containing napi structure is freed.
Followup patch implements busy poll for mlx5 driver as an example.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.
This has many benefits.
1) allow tunneled traffic to use busy poll as well as native traffic.
Tunnels handlers usually call netif_rx() and depend on net_rx_action()
being run (from sofirq handler)
2) allow RFS/RPS being used (sending IPI to other cpus if needed)
3) use the 'lets burn cpu cycles' budget to do useful work
(like TX completions, timers, RCU callbacks...)
4) reduce BH latencies, making busy poll a better citizen.
Tested:
Tested with SIT tunnel
lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 37373.93
16384 87380
Now enable busy poll on both hosts
lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
lpaa6:~# echo 70 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 58314.77
16384 87380
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is really little gain from inlining this big function.
We'll soon make it even bigger in following patches.
This means we no longer need to export napi_by_id()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.
We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()
As suggested by Alexei, we could split the space so that
these errors can not happen.
0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.
This will allow proper busy polling support over tunnels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.
Added newline to "too small MTU" message.
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix list tests in netfilter ingress support, from Florian Westphal.
2) Fix reversal of input and output interfaces in ingress hook
invocation, from Pablo Neira Ayuso.
3) We have a use after free in r8169, caught by Dave Jones, fixed by
Francois Romieu.
4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.
5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
a) Don't create clone routes not managed by the fib6 tree
b) Don't forget to check expiration of DST_NOCACHE routes.
c) Handle rt->dst.from == NULL properly.
6) Several AF_PACKET fixes wrt transport header setting and SKB
protocol setting, from Daniel Borkmann.
7) Fix thunder driver crash on shutdown, from Pavel Fedin.
8) Several Mellanox driver fixes (max MTU calculations, use of correct
DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.
9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
certain registers, etc.) from Neil Armstrong.
10) Make sure to disable preemption while updating per-cpu stats of ip
tunnels, from Jason A. Donenfeld.
11) Various ARM64 bpf JIT fixes, from Yang Shi.
12) Flush icache properly in ARM JITs, from Daniel Borkmann.
13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
Nagai.
14) Fix netdev feature propagation for devices not implementing
->ndo_set_features(). From Nikolay Aleksandrov.
15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.
16) RAW socket code increments incorrect SNMP counters, fix from Ben
Cartwright-Cox.
17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.
18) Fix handling of VLAN headers on stacked devices when REORDER is
disabled. From Vlad Yasevich.
19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
MAINTAINERS: Update Mellanox's Eth NIC driver entries
net/core: revert "net: fix __netdev_update_features return.." and add comment
af_unix: take receive queue lock while appending new skb
rtnetlink: fix frame size warning in rtnl_fill_ifinfo
net: use skb_clone to avoid alloc_pages failure.
packet: Use PAGE_ALIGNED macro
packet: Don't check frames_per_block against negative values
net: phy: Use interrupts when available in NOLINK state
phy: marvell: Add support for 88E1540 PHY
arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
macvlan: fix leak in macvlan_handle_frame
ipvlan: fix use after free of skb
ipvlan: fix leak in ipvlan_rcv_frame
vlan: Do not put vlan headers back on bridge and macvlan ports
vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
via-velocity: unconditionally drop frames with bad l2 length
ipg: Remove ipg driver
dl2k: Add support for IP1000A-based cards
snmp: Remove duplicate OUTMCAST stat increment
net: thunder: Check for driver data in nicvf_remove()
...
This reverts commit 00ee592717 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.
For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.
Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warning:
CC net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
}
^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.
Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PAGE_ALIGNED(...) instead of open-coding it.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
rb->frames_per_block is an unsigned int, thus can never be negative.
Also fix spacing in the calculation of frames_per_block.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports. The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.
With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header. Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).
As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.
To solve this, we use skb->mac_len as the offset. The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using call_rcu(), the called function may be delayed quite
significantly, and without a matching rcu_barrier() there's no
way to be sure it has finished.
Therefore, global state that could be gone/freed/reused should
never be touched in the callback.
Fix this in mesh by moving the atomic_dec() into the caller;
that's not really a problem since we already unlinked the path
and it will be destroyed anyway.
This fixes a crash Jouni observed when running certain tests in
a certain order, in which the mesh interface was torn down, the
memory reused for a function pointer (work struct) and running
that then crashed since the pointer had been decremented by 1,
resulting in an invalid instruction byte stream.
Cc: stable@vger.kernel.org
Fixes: eb2b9311fd ("mac80211: mesh path table implementation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For now, this feature doesn't actually work. To avoid shipping a
kernel that has it enabled but where it can't be used disable it
for now - we can re-enable it when it's fixed.
This partially reverts 44674d9c22 ("mac80211: advertise support
for full station state in AP mode").
Cc: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed
Validated by the reporter, with good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent flaw in the netdev feature setting resulted in warnings
like this one from VLAN interfaces:
WARNING: CPU: 1 PID: 4975 at net/core/dev.c:2419 skb_warn_bad_offload+0xbc/0xcb()
: caps=(0x00000000001b5820, 0x00000000001b5829) len=2782 data_len=0 gso_size=1348 gso_type=16 ip_summed=3
The ":" is supposed to be preceded by a driver name, but in this
case it is an empty string since the device has no parent.
There are many types of network devices without a parent. The
anonymous warnings for these devices can be hard to debug. Log
the network device name instead in these cases to assist further
debugging.
This is mostly similar to how __netdev_printk() handles orphan
devices.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.
This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.
Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header. This is fixed by adding the IP Header Length to
the casting into a icmphdr struct.
Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.
Fixes: 6cb6a27c45 ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When __netdev_update_features() was updated to ensure some features are
disabled on new lower devices, an error was introduced for devices which
don't have the ndo_set_features() method set. Before we'll just set the
new features, but now we return an error and don't set them. Fix this by
returning the old behaviour and setting err to 0 when ndo_set_features
is not present.
Fixes: e7868a85e1 ("net/core: ensure features get disabled on new lower devs")
CC: Jarod Wilson <jarod@redhat.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Dave Young <dyoung@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP.
In this case we should not emit errors and warnings to the kernel log.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 0bc05d585d ("switchdev: allow caller to explicitly request
attr_set as deferred")
Fixes: 6ac311ae8b ("Adding switchdev ageing notification on port
bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Use skb_to_full_sk() helper to avoid illegal accesses to
inet_sk(skb->sk)
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some functions access TCP sockets without holding a lock and
might output non consistent data, depending on compiler and or
architecture.
tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
Introduce sk_state_load() and sk_state_store() to fix the issues,
and more clearly document where this lack of locking is happening.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.
even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():
if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;
so this wasn't working before and thus it won't break compatibility.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since it's introduction in commit 69e3c75f4d ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).
I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().
Fixes: 69e3c75f4d ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).
Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.
That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.
Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.
This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).
Fixes: 57f89bfa21 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f62 ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.
While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.
Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tpacket_fill_skb() commit c1aad275b0 ("packet: set transport
header before doing xmit") and later on 40893fd0fd ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.
Fixes: c1aad275b0 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All DST_NOCACHE rt6_info used to have rt->dst.from set to
its parent.
After commit 8e3d5be736 ("ipv6: Avoid double dst_free"),
DST_NOCACHE is also set to rt6_info which does not have
a parent (i.e. rt->dst.from is NULL).
This patch catches the rt->dst.from == NULL case.
Fixes: 8e3d5be736 ("ipv6: Avoid double dst_free")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the expires of the DST_NOCACHE rt can be set during
the ip6_rt_update_pmtu(), we also need to consider the expires
value when doing ip6_dst_check().
This patches creates __rt6_check_expired() to only
check the expire value (if one exists) of the current rt.
In rt6_dst_from_check(), it adds __rt6_check_expired() as
one of the condition check.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272571
The setup has a IPv4 GRE tunnel running in a IPSec. The bug
happens when ndisc starts sending router solicitation at the gre
interface. The simplified oops stack is like:
__lock_acquire+0x1b2/0x1c30
lock_acquire+0xb9/0x140
_raw_write_lock_bh+0x3f/0x50
__ip6_ins_rt+0x2e/0x60
ip6_ins_rt+0x49/0x50
~~~~~~~~
__ip6_rt_update_pmtu.part.54+0x145/0x250
ip6_rt_update_pmtu+0x2e/0x40
~~~~~~~~
ip_tunnel_xmit+0x1f1/0xf40
__gre_xmit+0x7a/0x90
ipgre_xmit+0x15a/0x220
dev_hard_start_xmit+0x2bd/0x480
__dev_queue_xmit+0x696/0x730
dev_queue_xmit+0x10/0x20
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x21f/0x770
ip6_finish_output+0xa7/0x1d0
ip6_output+0x56/0x190
~~~~~~~~
ndisc_send_skb+0x1d9/0x400
ndisc_send_rs+0x88/0xc0
~~~~~~~~
The rt passed to ip6_rt_update_pmtu() is created by
icmp6_dst_alloc() and it is not managed by the fib6 tree,
so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates
a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
and it causes the ip6_ins_rt() oops.
During pmtu update, we only want to create a RTF_CACHE clone
from a rt which is currently managed (or owned) by the
fib6 tree. It means either rt->rt6i_node != NULL or
rt is a RTF_PCPU clone.
It is worth to note that rt6i_table may not be NULL even it is
not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
Hence, rt6i_node is a better check instead of rt6i_table.
Fixes: 45e4fd2668 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.
First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.
Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.
This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.
Fixes: 2b514574f7 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
req->rc is pre-allocated early on with p9_tag_alloc and shouldn't be missing
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Most of the changes here are no-op and just renaming to use a
fcall struct, needed for p9_parse_header
It fixes the unaligned memory access to read the tag and defers to
common functions for part of the protocol knowledge (although header
length is still hard-coded...)
Reported-By: Rob Landley <rob@landley.net>
Signed-Off-By: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If we use wrong device name 9p mount fails with error
"9pnet_virtio: no channels available"
Improve the error output as below
"9pnet_virtio: no channels available for device /dev/root"
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Pull Ceph updates from Sage Weil:
"There are several patches from Ilya fixing RBD allocation lifecycle
issues, a series adding a nocephx_sign_messages option (and associated
bug fixes/cleanups), several patches from Zheng improving the
(directory) fsync behavior, a big improvement in IO for direct-io
requests when striping is enabled from Caifeng, and several other
small fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: clear msg->con in ceph_msg_release() only
libceph: add nocephx_sign_messages option
libceph: stop duplicating client fields in messenger
libceph: drop authorizer check from cephx msg signing routines
libceph: msg signing callouts don't need con argument
libceph: evaluate osd_req_op_data() arguments only once
ceph: make fsync() wait unsafe requests that created/modified inode
ceph: add request to i_unsafe_dirops when getting unsafe reply
libceph: introduce ceph_x_authorizer_cleanup()
ceph: don't invalidate page cache when inode is no longer used
rbd: remove duplicate calls to rbd_dev_mapping_clear()
rbd: set device_type::release instead of device::release
rbd: don't free rbd_dev outside of the release callback
rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
libceph: use local variable cursor instead of &msg->cursor
libceph: remove con argument in handle_reply()
ceph: combine as many iovec as possile into one OSD request
ceph: fix message length computation
ceph: fix a comment typo
rbd: drop null test before destroy functions
Pablo Neira Ayuso:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree. This
large batch that includes fixes for ipset, netfilter ingress, nf_tables
dynamic set instantiation and a longstanding Kconfig dependency problem.
More specifically, they are:
1) Add missing check for empty hook list at the ingress hook, from
Florian Westphal.
2) Input and output interface are swapped at the ingress hook,
reported by Patrick McHardy.
3) Resolve ipset extension alignment issues on ARM, patch from Jozsef
Kadlecsik.
4) Fix bit check on bitmap in ipset hash type, also from Jozsef.
5) Release buckets when all entries have expired in ipset hash type,
again from Jozsef.
6) Oneliner to initialize conntrack tuple object in the PPTP helper,
otherwise the conntrack lookup may fail due to random bits in the
structure holes, patch from Anthony Lineham.
7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann.
8) Fix Kconfig dependency problems with TPROXY, socket and dup, also
from Arnd.
9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters
from atomic context, this is required by the follow up fix for
nf_tables.
10) Fix crash from the dynamic set expression, we have to add new clone
operation that should be defined when a simple memcpy is not enough.
This resolves a crash when using per-cpu counters with new Patrick
McHardy's flow table nft support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
the breakup of the big NFSv4 state lock in 3.17--thanks especially to
Andrew Elble and Jeff Layton for tracking down some of the remaining
races.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQ51oAAoJECebzXlCjuG+b1AP+wWamMNw8eS0N98+KslfTMNd
BFcOFp6L5Hv1VRuwl67V9UUNS+9y5rsgWh9gMnQe5yPZ+dVABbO6mKfh3f7HJ2zg
aGzmE9ZdTMejjRDpSHHMEqxYePlxGVgxVhr8CgnzgkXf6KBEy3emfDlFIocgFWdR
JGWEhfOoOa+H7b3Awq3KxlxhAajq1ic14un2CLxTYdvjshwhlIjnscY1F7vNiNRg
O/ELQRKCCSNZbwGSV/OhNUXx3VQPQUh1eMvIfSD3Fs6AtMybIWGW5Fc36jxZJKt2
kllcEfukRnGa+Ezl/hwBWd1pEVMwmkYhNRt+9LtH8uWG4+uT5i3Mxn4taDYg8618
plp2GtRoC3VwOvUKEcKl3HhlRBu5H4zJtx9x60NDzAgNUtoKG/Dl7rhm7o7QwUk8
W3k0jYAryoyx+12fYO0dssdM4pj1Gi7nRGR687lKzXXttktbQF88ZS9EHhI3oFiC
Ak8ilEeap8ND1KIJY6Z2xr925BPpw+P2GXbd/Mr5H0aX3a3WM3wLPXlToZbve5EP
haYnTqbHw9QzqbLDcki8s0hNgv+xwlQWopoInGijfr3IAHQMpKStI0WxhyTYsb8g
0xyRWA1COnj5WvgznbkMky7Q45T27q26EFgaS8+LEJ1rtEpmNDZOaycbwym6XQHk
1oyydIRSWM3c7eWnDvFG
=wRg0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Apologies for coming a little late in the merge window. Fortunately
this is another fairly quiet one:
Mainly smaller bugfixes and cleanup. We're still finding some bugs
from the breakup of the big NFSv4 state lock in 3.17 -- thanks
especially to Andrew Elble and Jeff Layton for tracking down some of
the remaining races"
* tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
svcrpc: document lack of some memory barriers
nfsd: fix race with open / open upgrade stateids
nfsd: eliminate sending duplicate and repeated delegations
nfsd: remove recurring workqueue job to clean DRC
SUNRPC: drop stale comment in svc_setup_socket()
nfsd: ensure that seqid morphing operations are atomic wrt to copies
nfsd: serialize layout stateid morphing operations
nfsd: improve client_has_state to check for unused openowners
nfsd: fix clid_inuse on mount with security change
sunrpc/cache: make cache flushing more reliable.
nfsd: move include of state.h from trace.c to trace.h
sunrpc: avoid warning in gss_key_timeout
lockd: get rid of reference-counted NSM RPC clients
SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
lockd: create NSM handles per net namespace
nfsd: switch unsigned char flags in svc_fh to bools
nfsd: move svc_fh->fh_maxsize to just after fh_handle
nfsd: drop null test before destroy functions
nfsd: serialize state seqid morphing operations
The L2CAP core expects channel implementations to manage the reference
returned by the new_connection callback. With sockets this is already
handled with each channel being tied to the corresponding socket. With
SMP however there's no context to tie the pointer to in the
smp_new_conn_cb function. The function can also not just drop the
reference since it's the only one at that point.
For fixed channels (like SMP) the code path inside the L2CAP core from
new_connection() to ready() is short and straight-forwards. The
crucial difference is that in ready() the implementation has access to
the l2cap_conn that SMP needs associate its l2cap_chan. Instead of
taking a new reference in smp_ready_cb() we can simply assume to
already own the reference created in smp_new_conn_cb(), i.e. there is
no need to call l2cap_chan_hold().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.19+
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
With the conversion of the counter expressions to make it percpu, we
need to clone the percpu memory area, otherwise we crash when using
counters from flow tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kconfig is too smart for its own good: a Kconfig line that states
select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will
also be set to 'm', regardless of the state of the symbol from which
it is selected. When the xt_TEE driver is built-in and nothing else
forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error:
net/built-in.o: In function `tee_tg6':
net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6'
This works around that behavior by changing the dependency to
'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression
rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to
be built-in as well.
The bug only occurs once in thousands of 'randconfig' builds and
does not really impact real users. From inspecting the other
surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY
and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this
change should still be harmless.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After a recent (correct) change, gcc started warning about the use
of the 'flags' variable in nfulnl_recv_config()
net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config':
net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here
The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or
higher (including 5.2.1, which is the latest version I checked) I
tried working around it by rearranging the code but had no success
with that.
As a last resort, this initializes the variable to zero, which shuts
up the warning, but means that we don't get a warning if the code
is ever changed in a way that actually causes the variable to be
used without first being written.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8cbc870829 ("netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
expect them. But it doesn't appear they're necessary in our case, and
this is likely a hot path--for now just document the odd behavior.
Kosuke Tatsukawa found this issue while looking through the linux source
code for places calling waitqueue_active() before wake_up*(), but
without preceding memory barriers, after sending a patch to fix a
similar issue in drivers/tty/n_tty.c (Details about the original issue
can be found here: https://lkml.org/lkml/2015/9/28/849).
Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit 34c2d9fb04.
There are 2 reasons for this revert:
1) The commit in question doesn't do what it says it does. The
description reads: "Allow bridge forward delay to be configured
when Spanning Tree is enabled." This was already the case before
the commit was made. What the commit actually do was disallow
invalid values or 'forward_delay' when STP was turned off.
2) The above change was actually a change in the user observed
behavior and broke things like libvirt and other network configs
that set 'forward_delay' to 0 without enabling STP. The value
of 0 is actually used when STP is turned off to immediately mark
the bridge as forwarding.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The svc_setup_socket() function does set the send and receive buffer
sizes, so the comment is out-of-date:
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
Features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups
- Move socket data receive out of the bottom halves and into a workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC handles
errors identically.
- Fix a panic when blocks or object layouts reads return a bad data length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQPMXAAoJEGcL54qWCgDy6ZUQAL32vpgyMXe7R4jcxoQxm52+
tn8FrY8aBZAqucvQsIGCrYfE01W/s8goDTQdZODn0MCcoor12BTPVYNIR42/J/no
MNnRTDF0dJ4WG+inX9G87XGG6sFN3wDaQcCaexknkQZlFNF4KthxojzR2BgjmRVI
p3WKkLSNTt6DYQQ8eDetvKoDT0AjR/KCYm89tiE8GMhKYcaZl6dTazJxwOcp2CX9
YDW6+fQbsv8qp5v2ay03e88O/DSmcNRFoxy/KUGT9OwJgdN08IN8fTt6GG38yycT
D9tb9uObBRcll4PnucouadBcykGr6jAP0z8HklE266LH1dwYLOHQoDFdgAs0QGtq
nlySiKvToj6CYXonXoPOjZF3P/lxlkj5ViZ2enBxgxrPmyWl172cUSa6NTXOMO46
kPpxw50xa1gP5kkBVwIZ6XZuzl/5YRhB3BRP3g6yuJCbAwVBJvawYU7riC+6DEB9
zygVfm21vi9juUQXJ37zXVRBTtoFhFjuSxcAYxc63o181lWYShKQ3IiRYg+zTxnq
7DOhXa0ZNGvMgJJi0tH9Es3/S6TrGhyKh5gKY/o2XUjY0hCSsCSdP6jw6Mb9Ax1s
0LzByHAikxBKPt2OFeoUgwycI2xqow4iAfuFk071iP7n0nwC804cUHSkGxW67dBZ
Ve5Skkg1CV+oWQYxGmGZ
=py1V
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
New features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups:
- Move socket data receive out of the bottom halves and into a
workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC
handles errors identically.
- Fix a panic when blocks or object layouts reads return a bad data
length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files"
* tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits)
Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
nfs: Fix GETATTR bitmap verification
nfs: Remove unused xdr page offsets in getacl/setacl arguments
fs/nfs: remove unnecessary new_valid_dev check
SUNRPC: fix variable type
NFS: Enable client side NFSv4.1 backchannel to use other transports
pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS
pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors
SUNRPC: Remove the TCP-only restriction in bc_svc_process()
svcrdma: Add backward direction service for RPC/RDMA transport
xprtrdma: Handle incoming backward direction RPC calls
xprtrdma: Add support for sending backward direction RPC replies
xprtrdma: Pre-allocate Work Requests for backchannel
xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
SUNRPC: Abstract backchannel operations
xprtrdma: Saving IRQs no longer needed for rb_lock
xprtrdma: Remove reply tasklet
xprtrdma: Use workqueue to process RPC/RDMA replies
xprtrdma: Replace send and receive arrays
xprtrdma: Refactor reply handler error handling
...
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The uninitialized tuple structure caused incorrect hash calculation
and the lookup failed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106441
Signed-off-by: Anthony Lineham <anthony.lineham@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to a request socket,
xt_owner wants to gte the listener in this case.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
hllyhNNolI6FJ64j07Xm
=bBIY
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is my initial round of 4.4 merge window patches. There are a few
other things I wish to get in for 4.4 that aren't in this pull, as
this represents what has gone through merge/build/run testing and not
what is the last few items for which testing is not yet complete.
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
IB/core, cma: Make __attribute_const__ declarations sparse-friendly
IB/core: Remove old fast registration API
IB/ipath: Remove fast registration from the code
IB/hfi1: Remove fast registration from the code
RDMA/nes: Remove old FRWR API
IB/qib: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
RDMA/ocrdma: Remove old FRWR API
IB/mlx4: Remove old FRWR API support
IB/mlx5: Remove old FRWR API support
IB/srp: Dont allocate a page vector when using fast_reg
IB/srp: Remove srp_finish_mapping
IB/srp: Convert to new registration API
IB/srp: Split srp_map_sg
RDS/IW: Convert to new memory registration API
svcrdma: Port to new memory registration API
xprtrdma: Port to new memory registration API
iser-target: Port to new memory registration API
IB/iser: Port to new fast registration API
...
Pull trivial updates from Jiri Kosina:
"Trivial stuff from trivial tree that can be trivially summed up as:
- treewide drop of spurious unlikely() before IS_ERR() from Viresh
Kumar
- cosmetic fixes (that don't really affect basic functionality of the
driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
- various comment / printk fixes and updates all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
bcache: Really show state of work pending bit
hwmon: applesmc: fix comment typos
Kconfig: remove comment about scsi_wait_scan module
class_find_device: fix reference to argument "match"
debugfs: document that debugfs_remove*() accepts NULL and error values
net: Drop unlikely before IS_ERR(_OR_NULL)
mm: Drop unlikely before IS_ERR(_OR_NULL)
fs: Drop unlikely before IS_ERR(_OR_NULL)
drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
UBI: Update comments to reflect UBI_METAONLY flag
pktcdvd: drop null test before destroy functions
Incorrect index was used when the data blob was shrinked at expiration,
which could lead to falsely expired entries and memory leak when
the comment extension was used too.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.
Reported-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...
I mistakenly took wrong request sock pointer when calling tcp_move_syn()
@req_unhash is either a copy of @req, or a NULL value for
FastOpen connexions (as we do not expect to unhash the temporary
request sock from ehash table)
Fixes: 805c4bc057 ("tcp: fix req->saved_syn race")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race conditions between packet_notifier and packet_bind{_spkt}.
It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.
This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.
import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId
os.system('taskset -p 0x10 %d' % os.getpid())
s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
(realDev, dev, vlanId))
os.system('ip netns add dummy')
pid=os.fork()
if pid == 0:
# dev should be moved while packet_do_bind is in synchronize net
os.system('taskset -p 0x20000 %d' % os.getpid())
os.system('ip link set %s netns dummy' % dev)
os.system('ip netns exec dummy ip link del %s' % dev)
s.close()
sys.exit(0)
time.sleep(.004)
try:
s.bind(('%s' % dev, proto+1))
except:
print 'Could not bind socket'
s.close()
os.system('ip netns del dummy')
sys.exit(0)
os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before converting a 'socket pointer' into inet socket,
use sk_fullsock() to detect timewait or request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the reasons explained in commit ce1050089c ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
req->saved_syn unless we own the request sock.
This fixes races for listeners using TCP_SAVE_SYN option.
Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2015-11-05
The following set of Bluetooth patches would be good to get into 4.4-rc1
if possible:
- Fix for missing LE CoC parameter validity checks
- Fix for potential deadlock in btusb
- Fix for issuing unsupported commands during HCI init
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big tty and serial driver update for 4.4-rc1.
Lots of serial driver updates and a few small tty core changes. Full
details in the shortlog.
All of these have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6f64ACgkQMUfUDdst+ykf8gCfYPjtHy5hD/TsharaeXROnVgi
W8cAn16xk1Nmnde220MNNpO6zDu65G/1
=kslf
-----END PGP SIGNATURE-----
Merge tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty and serial driver update for 4.4-rc1.
Lots of serial driver updates and a few small tty core changes. Full
details in the shortlog.
All of these have been in linux-next for a while"
* tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (148 commits)
tty: Use unbound workqueue for all input workers
tty: Abstract tty buffer work
tty: Prevent tty teardown during tty_write_message()
tty: core: Use correct spinlock flavor in tiocspgrp()
tty: Combine SIGTTOU/SIGTTIN handling
serial: amba-pl011: fix incorrect integer size in pl011_fifo_to_tty()
ttyFDC: Fix build problems due to use of module_{init,exit}
tty: remove unneeded return statement
serial: 8250_mid: add support for DMA engine handling from UART MMIO
dmaengine: hsu: remove platform data
dmaengine: hsu: introduce stubs for the exported functions
dmaengine: hsu: make the UART driver in control of selecting this driver
serial: fix mctrl helper functions
serial: 8250_pci: Intel MID UART support to its own driver
serial: fsl_lpuart: add earlycon support
tty: disable unbind for old 74xx based serial/mpsc console port
serial: pl011: Spelling s/clocks-names/clock-names/
n_tty: Remove reader wakeups for TTY_BREAK/TTY_PARITY chars
tty: synclink, fix indentation
serial: at91, fix rs485 properties
...
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.
Fixes: a317a2f19d ("ipv6: fail early when creating netdev named all or default")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a connect response we should make sure that the DCID is
within the valid range and that we don't already have another channel
allocated for the same DCID.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The 'dyn_end' value is also a valid CID so it should be included in
the range of values checked.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The core spec defines specific response codes for situations when the
received CID is incorrect. Add the defines for these and return them
as appropriate from the LE Connect Request handler function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The white list commands might not be implemented if the controller does
not actually support the white list. So check the supported commands
first before issuing these commands. Not supporting the white list is
the same as supporting a white list with zero size.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a listen socket enqueues a connection for userspace to accept(),
the sk->sk_data_ready() callback should be invoked. In-kernel socket
users rely on this callback to detect when incoming connections are
available.
Currently the sk->sk_state_change() callback is invoked by
vmci_transport.c. This happens to work for userspace applications since
sk->sk_state_change = sock_def_wakeup() and sk->sk_data_ready =
sock_def_readable() both wake up the accept() waiter. In-kernel socket
users, on the other hand, fail to detect incoming connections.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With moving netdev_sync_lower_features() after the .ndo_set_features
calls, I neglected to verify that devices added *after* a flag had been
disabled on an upper device were properly added with that flag disabled as
well. This currently happens, because we exit __netdev_update_features()
when we see dev->features == features for the upper dev. We can retain the
optimization of leaving without calling .ndo_set_features with a bit of
tweaking and a goto here.
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bug report (https://bugzilla.kernel.org/show_bug.cgi?id=107071) noted
that the follwoing ip command is failing with v4.3:
$ ip route add 10.248.5.0/24 dev bond0.250 table vlan_250 src 10.248.5.154
RTNETLINK answers: Invalid argument
021dd3b8a1 changed the lookup of the given preferred source address to
use the table id passed in, but this assumes the local entries are in the
given table which is not necessarily true for non-VRF use cases. When
validating the preferred source fallback to the local table on failure.
Fixes: 021dd3b8a1 ("net: Add routes to the table associated with the device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha reported the following lockdep warning:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
This is due to that for IP_MSFILTER and MCAST_MSFILTER, we take
rtnl lock before the socket lock in setsockopt() path, but take
the socket lock before rtnl lock in getsockopt() path. All the
rest optnames are setsockopt()-only.
Fix this by aligning the getsockopt() path with the setsockopt()
path, so that all mcast socket path would be locked in the same
order.
Note, IPv6 part is different where rtnl lock is not held.
Fixes: 54ff9ef36b ("ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/netfilter/xt_TEE.c
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix crash when TEE target is used with no --oif, from Eric Dumazet.
2) Oneliner to fix a crash on the redirect traffic to localhost
infrastructure when interface has not yet an address, from
Munehisa Kamata.
3) Oneliner not to request module all the time from nfnetlink due to
wrong type value, from Florian Westphal.
I'll make sure these patches 1 and 2 hit -stable.
====================
The conflict in net/netfilter/xt_TEE.c was minor, a change
to the 'oif' selection overlapping a function signature
change for the nf_dup_ipv{4,6}() routines.
Signed-off-by: David S. Miller <davem@davemloft.net>