The system wakeup framework is not very consistent with respect to
the way it handles suspend-to-idle and generally wakeup events
occurring during transitions to system low-power states.
First off, system transitions in progress are aborted by the event
reporting helpers like pm_wakeup_event() only if the wakeup_count
sysfs attribute is in use (as documented), but there are cases in
which system-wide transitions should be aborted even if that is
not the case. For example, a wakeup signal from a designated
wakeup device during system-wide PM transition, it should cause
the transition to be aborted right away.
Moreover, there is a freeze_wake() call in wakeup_source_activate(),
but that really is only effective after suspend_freeze_state has
been set to FREEZE_STATE_ENTER by freeze_enter(). However, it
is very unlikely that wakeup_source_activate() will ever be called
at that time, as it could only be triggered by a IRQF_NO_SUSPEND
interrupt handler, so wakeups from suspend-to-idle don't really
occur in wakeup_source_activate().
At the same time there is a way to abort a system suspend in
progress (or wake up the system from suspend-to-idle), which is by
calling pm_system_wakeup(), but in turn that doesn't cause any
wakeup source objects to be activated, so it will not be covered
by wakeup source statistics and will not prevent the system from
suspending again immediately (in case autosleep is used, for
example). Consequently, if anyone wants to abort system transitions
in progress and allow the wakeup_count mechanism to work, they need
to use both pm_system_wakeup() and pm_wakeup_event(), say, at the
same time which is awkward.
For the above reasons, make it possible to trigger
pm_system_wakeup() from within wakeup_source_activate() and
provide a new pm_wakeup_hard_event() helper to do so within the
wakeup framework.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull x86 fixes from Thomas Gleixner:
"The final fixes for 4.11:
- prevent a triple fault with function graph tracing triggered via
suspend to ram
- prevent optimizing for size when function graph tracing is enabled
and the compiler does not support -mfentry
- prevent mwaitx() being called with a zero timeout as mwaitx() might
never return. Observed on the new Ryzen CPUs"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Prevent timer value 0 for MWAITX
x86/build: convert function graph '-Os' error to warning
ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
Pull scheduler fix from Thomas Gleixner:
"A single fix for a cputime accounting regression which got introduced
in the 4.11 cycle"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Fix ksoftirqd cputime accounting regression
Newer hardware has uncovered a bug in the software implementation of
using MWAITX for the delay function. A value of 0 for the timer is meant
to indicate that a timeout will not be used to exit MWAITX. On newer
hardware this can result in MWAITX never returning, resulting in NMI
soft lockup messages being printed. On older hardware, some of the other
conditions under which MWAITX can exit masked this issue. The AMD APM
does not currently document this and will be updated.
Please refer to http://marc.info/?l=kvm&m=148950623231140 for
information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
This has been root-caused as a 0 passed to MWAITX causing it to wait
indefinitely.
This change has the added benefit of avoiding the unnecessary setup of
MONITORX/MWAITX when the delay value is zero.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natarajan@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
got merged. The common code called code in another file that wasn't always
built. This just forces it on so people don't run into this bad configuration.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJZA9KFAAoJEK0CiJfG5JUlcV8P/iLnuMVopTxnaAvEF2zgQ50v
YFeqt66TRQBgU6DBfw2n6i7aXacDzi3E/LMjWCFhiAb9kWy9oMdAcZbP6ahwqtcx
bO0sEQAEZXA585SAIkzBL4tI9IZdc2Xo4ynCGM6b7bbKfJN275k9qDKQKnSVVV6A
RXjpWYMK0mBTGv4eXJlCS22r7b6YOGGC0cw1W76/+G4upOOTPjuh/nkTUAergqMd
+v6CDmAma6GPl43aXtvQw0WnxFrUF2xARscCiuNqfll6m/YxB2s+FsH38LQpDBSp
hNQsmAEkx3OqxhT1HEsA/+1Xcit0mLaJ+0c7IEQfysZvGKbQUY8DtSP8qjntPRpw
FRQQQ6CLIWaQv99RBmwl+BL2crTV1Ydd8XAEVSaEiyxN3YGt8p0ysL+bX4+POVsQ
1liwvnU/59u/NNOtzrIJXGZUj0sOlyPsmtw73HAD7b7sx7rA4YZ6HouDXefzmUOW
+NsvYnEHcTCgviKLBF3wJV1/fuVIYrzDhic0JzFZdWW7tvJ7IXYt8fLf0XcOX0ff
Ej7TVRuFp4nzH0zThu9bOYacV2Ou0nOL9zQdiSMCMelx7xChkVtiGcyDdiw7gxws
ShNOSG/wKL2RMtxlVeEiBTIR2q6r+p98OwPD1fCQ3X9eVx/8pR4s5WSLn7QxAot9
0+HuNRMvRZObTPaUTsVC
=VSUW
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One odd config build fix for a recent Allwinner clock driver change
that got merged. The common code called code in another file that
wasn't always built. This just forces it on so people don't run into
this bad configuration"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: always select CCU_GATE
Pull networking fixes from David Miller:
"Just a couple more stragglers, I really hope this is it.
1) Don't let frags slip down into the GRO segmentation handlers, from
Steffen Klassert.
2) Truesize under-estimation triggers warnings in TCP over loopback
with socket filters, 2 part fix from Eric Dumazet.
3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
from Paolo Abeni.
4) If we flush the XFRM policy after garbage collection, it doesn't
work because stray entries can be created afterwards. Fix from Xin
Long.
5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.
6) Fix GRO regression with IPSEC when netfilter is disabled, from
Sabrina Dubroca.
7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: hso: register netdev later to avoid a race condition
net: adjust skb->truesize in ___pskb_trim()
tcp: do not underestimate skb->truesize in tcp_trim_head()
bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
ipv4: Don't pass IP fragments to upper layer GRO handlers.
cpsw/netcp: refine cpts dependency
tipc: close the connection if protocol messages contain errors
tipc: improve error validations for sockets in CONNECTING state
tipc: Fix missing connection request handling
xfrm: fix GRO for !CONFIG_NETFILTER
xfrm: do the garbage collection after flushing policy
Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket.
As we did recently in commit 158f323b98 ("net: adjust skb->truesize in
pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
via a call to skb_condense().
If all frags were freed, then skb->truesize can be recomputed.
This call can be done if skb is not yet owned, or destructor is
sock_edemux().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket over loopback interface.
I believe one issue with looped skbs is that tcp_trim_head() can end up
producing skb with under estimated truesize.
It hardly matters for normal conditions, since packets sent over
loopback are never truncated.
Bytes trimmed from skb->head should not change skb truesize, since
skb->head is not reallocated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.
Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.
If bond0 is configured on top of ipoib devices, with the
following commands:
ifup bond0
for slave in $BOND_SLAVES_LIST; do
ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>
we will obtain a skb_under_panic() with a similar call trace:
skb_push+0x3d/0x40
push_pseudo_header+0x17/0x30 [ib_ipoib]
ipoib_hard_header+0x4e/0x80 [ib_ipoib]
arp_create+0x12f/0x220
arp_send_dst.part.19+0x28/0x50
arp_solicit+0x115/0x290
neigh_probe+0x4d/0x70
__neigh_event_send+0xa7/0x230
neigh_resolve_output+0x12e/0x1c0
ip_finish_output2+0x14b/0x390
ip_finish_output+0x136/0x1e0
ip_output+0x76/0xe0
ip_local_out+0x35/0x40
ip_send_skb+0x19/0x40
ip_push_pending_frames+0x33/0x40
raw_sendmsg+0x7d3/0xb50
inet_sendmsg+0x31/0xb0
sock_sendmsg+0x38/0x50
SYSC_sendto+0x102/0x190
SyS_sendto+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().
The bug is there since commit 54ef313714 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b6335 ("IB/ipoib: move back IB LL address into the hard
header").
Reported-by: Norbert P <noe@physik.uzh.ch>
Fixes: 54ef313714 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b6335 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upper layer GRO handlers can not handle IP fragments, so
exit GRO processing in this case.
This fixes ESP GRO because the packet must be reassembled
before we can decapsulate, otherwise we get authentication
failures.
It also aligns IPv4 to IPv6 where packets with fragmentation
headers are not passed to upper layer GRO handlers.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lindgren reports a kernel oops that resulted from my compile-time
fix on the default config. This shows two problems:
a) configurations that did not already enable PTP_1588_CLOCK will
now miss the cpts driver
b) when cpts support is disabled, the driver crashes. This is a
preexisting problem that we did not notice before my patch.
While the second problem is still being investigated, this modifies
the dependencies again, getting us back to the original state, with
another 'select NET_PTP_CLASSIFY' added in to avoid the original
link error we got, and the 'depends on POSIX_TIMERS' to hide
the CPTS support when turning it on would be useless.
Cc: stable@vger.kernel.org # 4.11 needs this
Fixes: 07fef36234 ("cpsw/netcp: cpts depends on posix_timers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-04-28
1) Do garbage collecting after a policy flush to remove old
bundles immediately. From Xin Long.
2) Fix GRO if netfilter is not defined.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fix from Dmitry Torokhov:
"Yet another quirk to i8042 to get touchpad recognized on some laptops"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Clevo P650RS to the i8042 reset list
When the base driver is enabled but all SoC specific drivers are turned
off, we now get a build error after code was added to always refer to the
clk gates:
drivers/clk/built-in.o: In function `ccu_pll_notifier_cb':
:(.text+0x154f8): undefined reference to `ccu_gate_helper_disable'
:(.text+0x15504): undefined reference to `ccu_gate_helper_enable'
This changes the Kconfig to always require the gate code to be built-in
when CONFIG_SUNXI_CCU is set.
Fixes: 02ae2bc6fe ("clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Pull btrfs fix from Chris Mason:
"We have one more fix for btrfs.
This gets rid of a new WARN_ON from rc1 that ended up making more
noise than we really want. The larger fix for the underflow got
delayed a bit and it's better for now to put it under
CONFIG_BTRFS_DEBUG"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: qgroup: move noisy underflow warning to debugging build
Parthasarathy Bhuvaragan says:
====================
tipc: fix hanging socket connections
This patch series contains fixes for the socket layer to
prevent hanging / stale connections.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.
In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.
In this commit, we terminate the connection when we receive such
error messages.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.
However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.
In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In filter_connect, we use waitqueue_active() to check for any
connections to wakeup. But waitqueue_active() is missing memory
barriers while accessing the critical sections, leading to
inconsistent results.
In this commit, we replace this with an SMP safe wq_has_sleeper()
using the generic socket callback sk_data_ready().
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
our NFSv2/v3 xdr code that could crash the server or leak memory.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZAlLrAAoJECebzXlCjuG+lb8P/idTu9rLGaU1VYPInrdoXru0
iPY+p5inmGSYW2MfCGlS7disaCACgzPBKVqKjeNB1hfHn2JZCrfeBd0XvBYlc7TH
JYmlHKSBjN/ZfnYMl0WrlVUCZstt4JVmxWjO3sZTCL3nImEbGA7d13yBDagxISWh
gM9wOOiJwR5lzT/W3MezNCYj4n27/vVhODMP+Qhy0rpK08dBORIH0Bi7hwtVgdD9
cMVIXMbujmJrCz0Uhbo/DhoItcePRBrCLTXdY6WEeMmHxnXlyaA2XuA0EjjZHuMs
+BsbAOsNy1BxY1c8Z2ignYgRymUvUBHeiJeIGZLbyKKM2OJMtE4BoqlSWjx08P3Y
hTwTuaw8u7uxfAsvxepukZoonWj/uVY5tP2Hyq+K6CIKKJpB7vj+c3QGYD3dNnUu
zDl/LG3ayZgpPXqfjKRnSzZE+St1/IwDnvaM2WN2B1mkuerVr8qDGq6xd4kB0QKX
BcEKiwcb2ewfPaLlVnSXz6Wbuh2pB42BObJjC3qbOgMvQ7SBUM0UcZmFpJJ20uCR
BX20aFzB/GHcd6fTRHpDrAxB4XGdVG/8Da5Ki2WhRnmmaeSXushhiKWImujY9B6i
s4mZHu4gGGJdLzOT5u2HZl93STevriL70SA9nZPhPLQycnJMUAO0buU9UcNI7wXR
GVu2F3IHd+anxDtDwOv4
=Glhl
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
in our NFSv2/v3 xdr code that could crash the server or leak memory"
* tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
nfsd: stricter decoding of write-like NFSv2/v3 ops
nfsd4: minor NFSv2/v3 write decoding cleanup
nfsd: check for oversized NFSv2/v3 arguments
stable.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJZAhJLAAoJEEp/3jgCEfOLw4gH/ia+bMzmsnkYtjMfxQfCh0ia
MHi7JS/YcAej/o71c/tvWlTU7mRbmvUCVSAcishRNytEBNGL8YzkP12vMOp/5Vdx
kKk6yDWn9z0mR5/YdBKaE8ziM5Umdy+zLqeL4yuxyhtbxKFGUPG4txJKS5WD80yU
Ld/toF2fL3y/JEs+s1pd5G+DPhEhEm2hFf56/VI6N7y08CHJgTqHB3GJ3ZnuUbnU
UhSvNR9skdVirObI8jt3oWIix8uAGq5+6MjVeTqXo75Qng5sdBGZ8S2agxXbM3j7
Hu8h/1bhKyPCUzAXnOyGcZeR+5DQolKmlKLhogbT4I9X4YC2ie4Djg0bmFHscWI=
=8aUa
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a kernel stack overflow bug in ceph setattr code, marked for
stable"
* tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
Pull vfs fixes from Al Viro:
- fix orangefs handling of faults on write() - I'd missed that one back
when orangefs was going through review.
- readdir counterpart of "9p: cope with bogus responses from server in
p9_client_{read,write}" - server might be lying or broken, and we'd
better not overrun the kmalloc'ed buffer we are copying the results
into.
- NFS O_DIRECT read/write can leave iov_iter advanced by too much;
that's what had been causing iov_iter_pipe() warnings davej had been
seeing.
- statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
go in before 4.11.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
fix nfs O_DIRECT advancing iov_iter too much
p9_client_readdir() fix
orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
The change in commit 1e2f82d1e9 ("statx: Kill fd-with-NULL-path
support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
statx() is inconsistent.
It results in the error EINVAL for a NULL pathname. Other system calls
with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.
The solution is simply to remove the EINVAL check. As I already pointed
out in [1], user_path_at*() and filename_lookup() will handle the NULL
pathname as per the other APIs, to correctly produce the error EFAULT.
[1] https://lkml.org/lkml/2017/4/26/561
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.
But this behaviour got broken by:
a499a5a14d ("sched/cputime: Increment kcpustat directly on irqtime account")
... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().
This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.
ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().
To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.
Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment asserting that the value of struct statx_timestamp.tv_nsec
must be negative when statx_timestamp.tv_sec is negative, is wrong, as
could be seen from the following example:
#define _FILE_OFFSET_BITS 64
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <linux/stat.h>
int main(void)
{
static const struct timespec ts[2] = {
{ .tv_nsec = UTIME_OMIT },
{ .tv_sec = -2, .tv_nsec = 42 }
};
assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);
struct stat st;
assert(stat(".", &st) == 0);
printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
(long long) st.st_mtim.tv_sec,
(unsigned long) st.st_mtim.tv_nsec);
struct statx stx;
assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
(long long) stx.stx_mtime.tv_sec,
(unsigned long) stx.stx_mtime.tv_nsec);
return 0;
}
It expectedly prints:
st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42
The more generic comment asserting that the value of struct
statx_timestamp.tv_nsec might be negative is confusing to say the least.
It contradicts both the struct stat.st_[acm]time_nsec tradition and
struct timespec.tv_nsec requirements in utimensat syscall.
If statx syscall ever returns a stx_[acm]time containing a negative
tv_nsec that cannot be passed unmodified to utimensat syscall,
it will cause an immense confusion.
Fix this source of confusion by changing the type of struct
statx_timestamp.tv_nsec from __s32 to __u32.
Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
cc: mtk.manpages@gmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull sparc fixes from David Miller:
"I didn't want the release to go out without the statx system call
properly hooked up"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Update syscall tables.
sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API
With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:
statx(dfd, NULL, 0, ...);
and:
statx(dfd, "", AT_EMPTY_PATH, ...);
Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent. dfd can be a
non-directory provided path is "".
[ The timing of this isn't wonderful, but applying this now before we
have statx() in any released kernel, before anybody starts using the
NULL special case. - Linus ]
Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) MLX5 bug fixes from Saeed Mahameed et al:
- released wrong resources when firmware timeout happens
- fix wrong check for encapsulation size limits
- UAR memory leak
- ETHTOOL_GRXCLSRLALL failed to fill in info->data
2) Don't cache l3mdev on mis-matches local route, causes net devices to
leak refs. From Robert Shearman.
3) Handle fragmented SKBs properly in macsec driver, the problem is
that we were mis-sizing the sgvec table. From Jason A. Donenfeld.
4) We cannot have checksum offload enabled for inner UDP tunneled
packet during IPSEC, from Ansis Atteka.
5) Fix double SKB free in ravb driver, from Dan Carpenter.
6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.
7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
from Maksim Salau.
8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge
the broadcast queue properly on port destroy.
9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert
Kenward.
10) Fix memory leaks in team driver, from Pan Bian.
11) Don't setup ipv6_stub before it can be actually used, from Paolo
Abeni.
12) Fix tipc socket flow control accounting, from Parthasarathy
Bhuvaragan.
13) Fix crash on module unload in hso driver, from Andreas Kemnade.
14) Fix purging of bridge multicast entries, the problem is that if we
don't defer it to ndo_uninit it's possible for new entries to get
added after we purge. Fix from Xin Long.
15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
Potapenko.
16) Fix autoneg stall properly in PHY layer, and revert micrel driver
change that was papering over it. From Alexander Kochetkov.
17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
code, from Cong Wang.
18) Clear out the congestion control private of the TCP socket in all of
the right places, from Wei Wang.
19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
Bainbridge.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ipv6: check raw payload size correctly in ioctl
tcp: memset ca_priv data to 0 properly
ipv6: check skb->protocol before lookup for nexthop
net: core: Prevent from dereferencing null pointer when releasing SKB
macsec: dynamically allocate space for sglist
Revert "phy: micrel: Disable auto negotiation on startup"
net: phy: fix auto-negotiation stall due to unavailable interrupt
net/packet: check length in getsockopt() called with PACKET_HDRLEN
net: ipv6: regenerate host route if moved to gc list
bridge: move bridge multicast cleanup to ndo_uninit
ipv6: fix source routing
qed: Fix error in the dcbx app meta data initialization.
netvsc: fix calculation of available send sections
net: hso: fix module unloading
tipc: fix socket flow control accounting error at tipc_recv_stream
tipc: fix socket flow control accounting error at tipc_send_stream
ipv6: move stub initialization after ipv6 setup completion
team: fix memory leaks
sfc: tx ring can only have 2048 entries for all EF10 NICs
macvlan: Fix device ref leak when purging bc_queue
...
In situations where an skb is paged, the transport header pointer and
tail pointer can be the same because the skb contents are in frags.
This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
length of 0 when the length to receive is actually greater than zero.
skb->len is already correctly set in ip6_input_finish() with
pskb_pull(), so use skb->len as it always returns the correct result
for both linear and paged data.
Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always zero out ca_priv data in tcp_assign_congestion_control() so that
ca_priv data is cleared out during socket creation.
Also always zero out ca_priv data in tcp_reinit_congestion_control() so
that when cc algorithm is changed, ca_priv data is cleared out as well.
We should still zero out ca_priv data even in TCP_CLOSE state because
user could call connect() on AF_UNSPEC to disconnect the socket and
leave it in TCP_CLOSE state and later call setsockopt() to switch cc
algorithm on this socket.
Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:
neigh = dst_neigh_lookup(skb_dst(skb),
&ipv6_hdr(skb)->daddr);
if (!neigh)
goto tx_err_link_failure;
addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
addr_type = ipv6_addr_type(addr6);
if (addr_type == IPV6_ADDR_ANY)
addr6 = &ipv6_hdr(skb)->daddr;
memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.
This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.
Fixes: ea3dc9601b ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added NULL check to make __dev_kfree_skb_irq consistent with kfree
family of functions.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We call skb_cow_data, which is good anyway to ensure we can actually
modify the skb as such (another error from prior). Now that we have the
number of fragments required, we can safely allocate exactly that amount
of memory.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 99f81afc13.
It was papering over the real problem, which is fixed by commit
f555f34fdc ("net: phy: fix auto-negotiation stall due to unavailable
interrupt")
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
cable was plugged before the Ethernet interface was brought up.
The patch trigger PHY state machine to update link state if PHY was requested to
do auto-negotiation and auto-negotiation complete flag already set.
During power-up cycle the PHY do auto-negotiation, generate interrupt and set
auto-negotiation complete flag. Interrupt is handled by PHY state machine but
doesn't update link state because PHY is in PHY_READY state. After some time
MAC bring up, start and request PHY to do auto-negotiation. If there are no new
settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
PHY continue to stay in auto-negotiation complete state and doesn't fire
interrupt. At the same time PHY state machine expect that PHY started
auto-negotiation and is waiting for interrupt from PHY and it won't get it.
Fixes: 321beec504 ("net: phy: Use interrupts when available in NOLINK state")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Cc: stable <stable@vger.kernel.org> # v4.9+
Tested-by: Roger Quadros <rogerq@ti.com>
Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we got a bonus week, let me try to screw a few pending fixes.
A slightly large fix is the locking fix in ASoC STI driver, but it's
pretty board-specific, and the risk is fairly low.
All the rest are small / trivial fixes, mostly marked as stable, for
ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
drivers.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEECxfAB4MH3rD5mfB6bDGAVD0pKaQFAlkAczcOHHRpd2FpQHN1
c2UuZGUACgkQbDGAVD0pKaQP+Q//Z2yK7q80sbEpX6CVtTpX86Y4qNCPdO5v4p5Y
25w/kR1P1lpGUTl2FYv+CRB6hjjaNc5egTJqPGk2zjKT1ZSOR+Y1KE+woaTJ9iFF
fATKzQPaLFy+fB5w2egqB4WZnk0jx285QkDlcLPNIOrnaQwSvoc/Hlhxf+HmClhq
U2PN9ihdd9sX1XeyhmueJx98hm4BQ+MhVX4V1W86qF+qOOXmZ70pYQS/ertSpCwj
yf8na1wrWB0DChwdT44ZdHDhcnvPMqPI+Wg5R/kpJenPSdPjLht9hVkGz3Faj1TJ
30aEHzEiJClnbrJJ5vIF+5e2IXbMeNs/otdal3QmDG/u6bKUbeqNRVI0aNvqRL+r
RpwR01rDrxTuq/aL3Dn0vYoIwK49DvLLq74vovN1lySEvA6Mb0h7kH8PK6R8z9fC
yxHHDbnujo7eqLuQiwOmPDUjlRmulRn5XdnxdQtvcik4PWWfBxDO8xQNS9baVwFZ
lRwAsyuv6CAlLMGh5UbYQNnMj69KNd7HHmif4xtay2py64+NS92Q6Aba3ji50ww6
x3PEaxPta581d/b5czaJWZw1xqH+19k0U0v9ssla0OyOwHs7UFdm4NTY7hv/J6VD
MfG6XRfNEqUH/+OOoAGqYh2YywkDSj7KoyWgwX1E8RXMNnWuBQ8+I+LJH9fCtS2F
0Y5xVFs=
=TQDC
-----END PGP SIGNATURE-----
Merge tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Since we got a bonus week, let me try to screw a few pending fixes.
A slightly large fix is the locking fix in ASoC STI driver, but it's
pretty board-specific, and the risk is fairly low.
All the rest are small / trivial fixes, mostly marked as stable, for
ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
drivers"
* tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: intel: Fix PM and non-atomic crash in bytcr drivers
ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type
ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
ASoC: topology: Fix to store enum text values
ASoC: STI: Fix null ptr deference in IRQ handler
ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d
Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.
It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.
This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().
The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.
Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
|val| remains uninitialized and the syscall may behave differently
depending on its value, and even copy garbage to userspace on certain
architectures. To fix this we now return -EINVAL if optlen is too small.
This bug has been detected with KMSAN.
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taking down the loopback device wreaks havoc on IPv6 routing. By
extension, taking down a VRF device wreaks havoc on its table.
Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.
Andrey's reproducer was the key to getting to the bottom of this.
With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.
The recent change to keep global IPv6 addresses added a new function,
fixup_permanent_addr, that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, re-inserting the route causes the reported
out-of-bounds accesses. If the device with the address is taken down
or the address is removed, the WARN_ON in fib6_del is triggered.
All of those faults are fixed by regenerating the host route if the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During removing a bridge device, if the bridge is still up, a new mdb entry
still can be added in br_multicast_add_group() after all mdb entries are
removed in br_multicast_dev_del(). Like the path:
mld_ifc_timer_expire ->
mld_sendpack -> ...
br_multicast_rcv ->
br_multicast_add_group
The new mp's timer will be set up. If the timer expires after the bridge
is freed, it may cause use-after-free panic in br_multicast_group_expired.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge]
Call Trace:
<IRQ>
[<ffffffff81094536>] call_timer_fn+0x36/0x110
[<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge]
[<ffffffff81096967>] run_timer_softirq+0x237/0x340
[<ffffffff8108dcbf>] __do_softirq+0xef/0x280
[<ffffffff8169889c>] call_softirq+0x1c/0x30
[<ffffffff8102c275>] do_softirq+0x65/0xa0
[<ffffffff8108e055>] irq_exit+0x115/0x120
[<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80
Nikolay also found it would cause a memory leak - the mdb hash is
reallocated and not freed due to the mdb rehash.
unreferenced object 0xffff8800540ba800 (size 2048):
backtrace:
[<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
[<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
[<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
[<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
[<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
[<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
[<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
[<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
[<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
[<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
[<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
[<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
[<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
[<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
[<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
[<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
This could happen when ip link remove a bridge or destroy a netns with a
bridge device inside.
With Nikolay's suggestion, this patch is to clean up bridge multicast in
ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so
that netif_running check in br_multicast_add_group can avoid this issue.
v1->v2:
- fix this issue by moving br_multicast_dev_del to ndo_uninit, instead
of calling dev_close in br_dev_delete.
(NOTE: Depends upon b6fe0440c6 ("bridge: implement missing ndo_uninit()"))
Fixes: e10177abf8 ("bridge: multicast: fix handling of temp and perm entries")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>