Commit Graph

722184 Commits

Author SHA1 Message Date
Gustavo A R Silva c30bf8cece i40e/virtchnl: fix application of sizeof to pointer
sizeof when applied to a pointer typed expression gives the size of
the pointer.

The proper fix in this particular case is to code sizeof(*vfres)
instead of sizeof(vfres).

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-27 13:50:35 -08:00
Vasily Averin 81833de1a4 lockd: fix "list_add double add" caused by legacy signal interface
restart_grace() uses hardcoded init_net.
It can cause to "list_add double add" in following scenario:

1) nfsd and lockd was started in several net namespaces
2) nfsd in init_net was stopped (lockd was not stopped because
 it have users from another net namespaces)
3) lockd got signal, called restart_grace() -> set_grace_period()
 and enabled lock_manager in hardcoded init_net.
4) nfsd in init_net is started again,
 its lockd_up() calls set_grace_period() and tries to add
 lock_manager into init_net 2nd time.

Jeff Layton suggest:
"Make it safe to call locks_start_grace multiple times on the same
lock_manager. If it's already on the global grace_list, then don't try
to add it again.  (But we don't intentionally add twice, so for now we
WARN about that case.)

With this change, we also need to ensure that the nfsd4 lock manager
initializes the list before we call locks_start_grace. While we're at
it, move the rest of the nfsd_net initialization into
nfs4_state_create_net. I see no reason to have it spread over two
functions like it is today."

Suggested patch was updated to generate warning in described situation.

Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin 9e137ed5ab nlm_shutdown_hosts_net() cleanup
nlm_complain_hosts() walks through nlm_server_hosts hlist, which should
be protected by nlm_host_mutex.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin 2317dc557a race of nfsd inetaddr notifiers vs nn->nfsd_serv change
nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex,
which can be changed during execution of notifiers and crash the host.

Moreover if notifiers were enabled in one net namespace they are enabled
in all other net namespaces, from creation until destruction.

This patch allows notifiers to access nn->nfsd_serv only after the
pointer is correctly initialized and delays cleanup until notifiers are
no longer in use.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin 6b18dd1c03 race of lockd inetaddr notifiers vs nlmsvc_rqst change
lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
nlmsvc_rqst can be changed during execution of notifiers and crash the host.

Patch enables access to nlmsvc_rqst only when it was correctly initialized
and delays its cleanup until notifiers are no longer in use.

Note that nlmsvc_rqst can be temporally set to ERR_PTR, so the "if
(nlmsvc_rqst)" check in notifiers is insufficient on its own.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Bhumika Goyal ee24eac3eb SUNRPC: make cache_detail structures const
Make these const as they are only getting passed to the function
cache_create_net having the argument as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Bhumika Goyal ae2e408ec2 NFSD: make cache_detail structures const
Make these const as they are only getting passed to the function
cache_create_net having the argument as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Bhumika Goyal d34971a65b sunrpc: make the function arg as const
Make the struct cache_detail *tmpl argument of the function
cache_create_net as const as it is only getting passed to kmemup having
the argument as const void *.
Add const to the prototype too.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Andrew Elble ae254dac72 nfsd: check for use of the closed special stateid
Prevent the use of the closed (invalid) special stateid by clients.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Naofumi Honda 64ebe12494 nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
From kernel 4.9, my two nfsv4 servers sometimes suffer from
    "panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().

These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".

The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().

Cc: stable@vger.kernel.org
Fixes: 7919d0a27f "nfsd: add a LRU list for blocked locks"
Cc: jlayton@redhat.com
Reveiwed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin 3a2b19d1ee lockd: lost rollback of set_grace_period() in lockd_down_net()
Commit efda760fe9 ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.

If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.

These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.

Fixes: efda760fe9 ("lockd: fix lockd shutdown race")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin a3152f1440 lockd: added cleanup checks in exit_net hook
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin b872285751 grace: replace BUG_ON by WARN_ONCE in exit_net hook
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Andrew Elble 4f34bd0540 nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
The use of the st_mutex has been confusing the validator. Use the
proper nested notation so as to not produce warnings.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin e919b07652 lockd: remove net pointer from messages
Publishing of net pointer is not safe,
use net->ns.inum as net ID in debug messages

[  171.757678] lockd_up_net: per-net data created; net=f00001e7
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  300.653313] lockd: nuking all hosts in net f00001e7...
[  300.653641] lockd: host garbage collection for net f00001e7
[  300.653968] lockd: nlmsvc_mark_resources for net f00001e7
[  300.711483] lockd_down_net: per-net data destroyed; net=f00001e7
[  300.711847] lockd: nuking all hosts in net 0...
[  300.711847] lockd: host garbage collection for net 0
[  300.711848] lockd: nlmsvc_mark_resources for net 0

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin ba589528d6 nfsd: remove net pointer from debug messages
Publishing of net pointer is not safe,
replace it in debug meesages by net->ns.inum

[  119.989161] nfsd: initializing export module (net: f00001e7).
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  322.185240] nfsd: shutting down export module (net: f00001e7).
[  322.186062] nfsd: export shutdown complete (net: f00001e7).

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust 03da3169c6 nfsd: Fix races with check_stateid_generation()
The various functions that call check_stateid_generation() in order
to compare a client-supplied stateid with the nfs4_stid state, usually
need to atomically check for closed state. Those that perform the
check after locking the st_mutex using nfsd4_lock_ol_stateid()
should now be OK, but we do want to fix up the others.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust 9271d7e509 nfsd: Ensure we check stateid validity in the seqid operation checks
After taking the stateid st_mutex, we want to know that the stateid
still represents valid state before performing any non-idempotent
actions.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust beeca19cf1 nfsd: Fix race in lock stateid creation
If we're looking up a new lock state, and the creation fails, then
we want to unhash it, just like we do for OPEN. However in order
to do so, we need to that no other LOCK requests can grab the
mutex until we have unhashed it (and marked it as closed).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust fd1fd685b3 nfsd4: move find_lock_stateid
Trivial cleanup to simplify following patch.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust 659aefb68e nfsd: Ensure we don't recognise lock stateids after freeing them
In order to deal with lookup races, nfsd4_free_lock_stateid() needs
to be able to signal to other stateful functions that the lock stateid
is no longer valid. Right now, nfsd_lock() will check whether or not an
existing stateid is still hashed, but only in the "new lock" path.

To ensure the stateid invalidation is also recognised by the "existing lock"
path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
change the type to NFS4_CLOSED_STID under the stp->st_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust fb500a7cfe nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust d8a1a00055 nfsd: Fix another OPEN stateid race
If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.

Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust 15ca08d329 nfsd: Fix stateid races between OPEN and CLOSE
Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Linus Torvalds 1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Thomas Meyer 141cbfba1d auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM
This avoids the MODPOST error:

  ERROR: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] undefined!

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:36:45 -08:00
Kirill A. Shutemov 152e93af3c mm, thp: Do not make pmd/pud dirty without a reason
Currently we make page table entries dirty all the time regardless of
access type and don't even consider if the mapping is write-protected.
The reasoning is that we don't really need dirty tracking on THP and
making the entry dirty upfront may save some time on first write to the
page.

Unfortunately, such approach may result in false-positive
can_follow_write_pmd() for huge zero page or read-only shmem file.

Let's only make page dirty only if we about to write to the page anyway
(as we do for small pages).

I've restructured the code to make entry dirty inside
maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
write-protected.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:26:29 -08:00
Kirill A. Shutemov a8f9736645 mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:26:29 -08:00
Jon Maloy 2e724dca77 tipc: eliminate access after delete in group_filter_msg()
KASAN revealed another access after delete in group.c. This time
it found that we read the header of a received message after the
buffer has been released.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27 14:44:45 -05:00
Eduardo Otubo 5b5971df3b xen-netfront: remove warning when unloading module
v2:
 * Replace busy wait with wait_event()/wake_up_all()
 * Cannot garantee that at the time xennet_remove is called, the
   xen_netback state will not be XenbusStateClosed, so added a
   condition for that
 * There's a small chance for the xen_netback state is
   XenbusStateUnknown by the time the xen_netfront switches to Closed,
   so added a condition for that.

When unloading module xen_netfront from guest, dmesg would output
warning messages like below:

  [  105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
  [  105.236839] deferring g.e. 0x903 (pfn 0x35805)

This problem relies on netfront and netback being out of sync. By the time
netfront revokes the g.e.'s netback didn't have enough time to free all of
them, hence displaying the warnings on dmesg.

The trick here is to make netfront to wait until netback frees all the g.e.'s
and only then continue to cleanup for the module removal, and this is done by
manipulating both device states.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27 14:21:58 -05:00
Jens Axboe 2967acbb25 blktrace: fix trace mutex deadlock
A previous commit changed the locking around registration/cleanup,
but direct callers of blk_trace_remove() were missed. This means
that if we hit the error path in setup, we will deadlock on
attempting to re-acquire the queue trace mutex.

Fixes: 1f2cac107c ("blktrace: fix unlocked access to init/start-stop/teardown")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-27 12:03:58 -07:00
Colin Ian King 66a7c84d67 i2c: i2c-boardinfo: fix memory leaks on devinfo
Currently when an error occurs devinfo is still allocated but is
unused when the error exit paths break out of the for-loop. Fix
this by kfree'ing devinfo to avoid the leak.

Detected by CoverityScan, CID#1416590 ("Resource Leak")

Fixes: 4124c4eba4 ("i2c: allow attaching IRQ resources to i2c_board_info")
Fixes: 0daaf99d84 ("i2c: copy device properties when using i2c_register_board_info()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-11-27 19:14:29 +01:00
Hans de Goede 6e0c9507bf i2c: i801: Fix Failed to allocate irq -2147483648 error
On Apollo Lake devices the BIOS does not set up IRQ routing for the i801
SMBUS controller IRQ, so we end up with dev->irq set to IRQ_NOTCONNECTED.

Detect this and do not try to use the irq in this case silencing:
i801_smbus 0000:00:1f.1: Failed to allocate irq -2147483648: -107

Cc: stable@vger.kernel.org
BugLink: https://communities.intel.com/thread/114759
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-11-27 19:11:27 +01:00
Darrick J. Wong 509955823c xfs: log recovery should replay deferred ops in order
As part of testing log recovery with dm_log_writes, Amir Goldstein
discovered an error in the deferred ops recovery that lead to corruption
of the filesystem metadata if a reflink+rmap filesystem happened to shut
down midway through a CoW remap:

"This is what happens [after failed log recovery]:

"Phase 1 - find and verify superblock...
"Phase 2 - using internal log
"        - zero log...
"        - scan filesystem freespace and inode maps...
"        - found root inode chunk
"Phase 3 - for each AG...
"        - scan (but don't clear) agi unlinked lists...
"        - process known inodes and perform inode discovery...
"        - agno = 0
"data fork in regular inode 134 claims CoW block 376
"correcting nextents for inode 134
"bad data fork in inode 134
"would have cleared inode 134"

Hou Tao dissected the log contents of exactly such a crash:

"According to the implementation of xfs_defer_finish(), these ops should
be completed in the following sequence:

"Have been done:
"(1) CUI: Oper (160)
"(2) BUI: Oper (161)
"(3) CUD: Oper (194), for CUI Oper (160)
"(4) RUI A: Oper (197), free rmap [0x155, 2, -9]

"Should be done:
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI A
"(8) RUD: for RUI B

"Actually be done by xlog_recover_process_intents()
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI B
"(8) RUD: for RUI A

"So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
from the log record in post_mount.log (generated after umount) and the trace
print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
entry [0x155, 2, -9] are freed."

When reconstructing the internal log state from the log items found on
disk, it's required that deferred ops replay in exactly the same order
that they would have had the filesystem not gone down.  However,
replaying unfinished deferred ops can create /more/ deferred ops.  These
new deferred ops are finished in the wrong order.  This causes fs
corruption and replay crashes, so let's create a single defer_ops to
handle the subsequent ops created during replay, then use one single
transaction at the end of log recovery to ensure that everything is
replayed in the same order as they're supposed to be.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Analyzed-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-27 09:34:08 -08:00
Darrick J. Wong 98c4f78dcd xfs: always free inline data before resetting inode fork during ifree
In xfs_ifree, we reset the data/attr forks to extents format without
bothering to free any inline data buffer that might still be around
after all the blocks have been truncated off the file.  Prior to commit
43518812d2 ("xfs: remove support for inlining data/extents into the
inode fork") nobody noticed because the leftover inline data after
truncation was small enough to fit inside the inline buffer inside the
fork itself.

However, now that we've removed the inline buffer, we /always/ have to
free the inline data buffer or else we leak them like crazy.  This test
was found by turning on kmemleak for generic/001 or generic/388.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-11-27 09:33:25 -08:00
Paolo Bonzini a63dd7480d PPC KVM fixes for 4.15
One commit here, that fixes a couple of bugs relating to the patch
 series that enables HPT guests to run on a radix host on POWER9
 systems.  This patch series went upstream in the 4.15 merge window,
 so no stable backport is required.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaGKSnAAoJEJ2a6ncsY3GfF7MIANLLhznEMrWq8jw4g95WsJU1
 MkDGwp8kIdhIOM9HD6JRskoJZB5Mws2BWlQ5PSaVFxO6v6eUgNLaRb/UBxC1r7gU
 1f9/8corY4BNkezSdJqTL7Xgp13KjTU726OwYAqCPEyCSPEc9ciMyeIgyZuv2dPa
 Pju+u4tnA+9JJyskgNL+/ybOOZwVat91VmNUVRq29zP6+zo1tmIDxrQchy6Bqui/
 7Wg298G+yjAkJ8ktQu69ACk+0oEBGUOcLUlraqGSr9auR+b0nJ1PAGCDRaONdwgE
 +X+OE+t+UC6rU+coUXMwO+Id0X7HMdsLQd3066ODEtD55g8MIVZ126Wt8xDmj5o=
 =GSTh
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master

PPC KVM fixes for 4.15

One commit here, that fixes a couple of bugs relating to the patch
series that enables HPT guests to run on a radix host on POWER9
systems.  This patch series went upstream in the 4.15 merge window,
so no stable backport is required.
2017-11-27 17:54:13 +01:00
Jan H. Schönherr 20b7035c66 KVM: Let KVM_SET_SIGNAL_MASK work as advertised
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:53:47 +01:00
Liu Bo ebb70442cd Btrfs: fix list_add corruption and soft lockups in fsync
Xfstests btrfs/146 revealed this corruption,

[   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
[   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[   58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
[   58.153518] ------------[ cut here ]------------
[   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
...
[   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
...
[   58.161956] Call Trace:
[   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
[   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
[   58.164393]  vfs_fsync_range+0x5f/0xd0
[   58.164898]  do_fsync+0x5a/0x90
[   58.165170]  SyS_fsync+0x10/0x20
[   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
...

It turns out that we could record btrfs_log_ctx:io_err in
log_one_extents when IO fails, but make log_one_extents() return '0'
instead of -EIO, so the IO error is not acknowledged by the callers,
i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
from stack memory, it'd get freed with a object alive on the
list. then a future list_add will throw the above warning.

This returns the correct error in the above case.

Jeff also reported this while testing against his fsync error
patch set[1].

[1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
"btrfs list corruption and soft lockups while testing writeback error handling"

Fixes: 8407f55326 ("Btrfs: fix data corruption after fast fsync and writeback error")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-27 17:41:19 +01:00
Wanpeng Li b74558259c KVM: VMX: Fix vmx->nested freeing when no SMI handler
Reported by syzkaller:

   ------------[ cut here ]------------
   WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel]
   CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26
   RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel]
   Call Trace:
    vmx_free_vcpu+0xda/0x130 [kvm_intel]
    kvm_arch_destroy_vm+0x192/0x290 [kvm]
    kvm_put_kvm+0x262/0x560 [kvm]
    kvm_vm_release+0x2c/0x30 [kvm]
    __fput+0x190/0x370
    task_work_run+0xa1/0xd0
    do_exit+0x4d2/0x13e0
    do_group_exit+0x89/0x140
    get_signal+0x318/0xb80
    do_signal+0x8c/0xb40
    exit_to_usermode_loop+0xe4/0x140
    syscall_return_slowpath+0x206/0x230
    entry_SYSCALL_64_fastpath+0x98/0x9a

The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the
vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However,
the testcase is just a simple c program and not be lauched by something
like seabios which implements smi_handler. Commit 05cade71cf (KVM: nSVM:
fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon
to false for the duration of SMM according to SDM 34.14.1 "leave VMX
operation" upon entering SMM. We can't alloc/free the vmx->nested stuff
each time when entering/exiting SMM since it will induce more overhead. So
the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested
stuff is still populated. What it expected is em_rsm() can mark nested.vmxon
to be true again. However, the smi_handler/rsm will not execute since there
is no something like seabios in this scenario. The function free_nested()
fails to free the vmx->nested stuff since the vmx->nested.vmxon is false
which results in the above warning.

This patch fixes it by also considering the no SMI handler case, luckily
vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon
in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested
stuff when L1 goes down.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Fixes: 05cade71cf (KVM: nSVM: fix SMI injection in guest mode)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:37:55 +01:00
Wanpeng Li c37c28730b KVM: VMX: Fix rflags cache during vCPU reset
Reported by syzkaller:

   *** Guest State ***
   CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
   CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1
   CR3 = 0x000000002081e000
   RSP = 0x000000000000fffa  RIP = 0x0000000000000000
   RFLAGS=0x00023000         DR7 = 0x00000000000000
          ^^^^^^^^^^
   ------------[ cut here ]------------
   WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
   CPU: 6 PID: 24431 Comm: reprotest Tainted: G        W  OE   4.14.0+ #26
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
   RSP: 0018:ffff880291d179e0 EFLAGS: 00010202
   Call Trace:
    kvm_vcpu_ioctl+0x479/0x880 [kvm]
    do_vfs_ioctl+0x142/0x9a0
    SyS_ioctl+0x74/0x80
    entry_SYSCALL_64_fastpath+0x23/0x9a

The failed vmentry is triggered by the following beautified testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
        struct kvm_debugregs dr = { 0 };

        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
        struct kvm_guest_debug debug = {
                .control = 0xf0403,
                .arch = {
                        .debugreg[6] = 0x2,
                        .debugreg[7] = 0x2
                }
        };
        ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug);
        ioctl(r[4], KVM_RUN, 0);
    }

which testcase tries to setup the processor specific debug
registers and configure vCPU for handling guest debug events through
KVM_SET_GUEST_DEBUG.  The KVM_SET_GUEST_DEBUG ioctl will get and set
rflags in order to set TF bit if single step is needed. All regs' caches
are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU
reset. However, the cache of rflags is not reset during vCPU reset. The
function vmx_get_rflags() returns an unreset rflags cache value since
the cache is marked avail, it is 0 after boot. Vmentry fails if the
rflags reserved bit 1 is 0.

This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and
its cache to 0x2 during vCPU reset.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:37:46 +01:00
Wanpeng Li e70b57a6ce KVM: X86: Fix softlockup when get the current kvmclock
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
 CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
 RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
 Call Trace:
  get_time_ref_counter+0x5a/0x80 [kvm]
  kvm_hv_process_stimers+0x120/0x5f0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
  kvm_vcpu_ioctl+0x33a/0x620 [kvm]
  do_vfs_ioctl+0xa1/0x5d0
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x1e/0xa9

This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
(set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
in kvm_get_time_scale() gets into an infinite loop.

This patch fixes it by treating the unhotplug pCPU as not using master clock.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:32:53 +01:00
Dr. David Alan Gilbert 12806ba937 KVM: lapic: Fixup LDR on load in x2apic
In x2apic mode the LDR is fixed based on the ID rather
than separately loadable like it was before x2.
When kvm_apic_set_state is called, the base is set, and if
it has the X2APIC_ENABLE flag set then the LDR is calculated;
however that value gets overwritten by the memcpy a few lines
below overwriting it with the value that came from userland.

The symptom is a lack of EOI after loading the state
(e.g. after a QEMU migration) and is due to the EOI bitmap
being wrong due to the incorrect LDR.  This was seen with
a Win2016 guest under Qemu with irqchip=split whose USB mouse
didn't work after a VM migration.

This corresponds to RH bug:
  https://bugzilla.redhat.com/show_bug.cgi?id=1502591

Reported-by: Yiqian Wei <yiwei@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: stable@vger.kernel.org
[Applied fixup from Liran Alon. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:32:53 +01:00
Dr. David Alan Gilbert e872fa9466 KVM: lapic: Split out x2apic ldr calculation
Split out the ldr calculation from kvm_apic_set_x2apic_id
since we're about to reuse it in the following patch.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:32:52 +01:00
Jeff Layton 9f97df50c5 reiserfs: remove unneeded i_version bump
The i_version field in reiserfs is not initialized and is only ever
updated here. Nothing ever views it, so just remove it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-27 17:31:07 +01:00
David S. Miller d3fe1e0185 Four fixes:
* CRYPTO_SHA256 is needed for regdb validation
  * mac80211: mesh path metric was wrong in some frames
  * mac80211: use QoS null-data packets on QoS connections
  * mac80211: tear down RX aggregation sessions first to
    drop fewer packets in HW restart scenarios
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlob6QMACgkQB8qZga/f
 l8SAvw/9E4fttLsdBhUT1gUJPoeNWsSXyhdoo0QWHs4Ni//bayNUL/Wx8QL6MGf/
 nprgxDygqFXH5iUO2gwpaMl2xeO1N7Px7bpKV/KvrDX/UAaEDyrkg6B87Omx95vT
 i584pU0LwLVL40CT+saKpF+kChe3S/rMPrAQR0xBNPntyIlo9w0vdOkdTC+vw0br
 /Nl3YJxpAmOmTDa8tiKLWHg4hSIS3Q7stylUjnEKNev4nBjBaPqfBW6sy/3o4NiR
 MfSCNvGeipr0QFZT31cHDry1fRZ/RvD23KSYNMggzctpYf6IRuHSEhIS3mW7GLpd
 GKZyLa7G7mZJAKXJAXOxb2PBBrXl/nTqSHJfme0VroGsYdqLHThXd47/0qaAB4Oo
 Io84WSmQX41kRyy4KNU7z0OFR9RgLbctdW/hwGeODxKEagGNNjuz+lbzql7jtYLn
 8UZK3LZHYC+qribCHR4Uf6A5rJP7bAcl36Ub/8l7AXNVesLagiaE0FGGrewGnE6X
 6qQLoYznd4/bjLQLVou56R9/t+o1nUuvmcGMkRxAlhvX3hw5nSUxgkWS/TgCdrBg
 cKddmM78AfEakjd6e21bH2QI2tG6fXS913lwIOq9JU870yVlNpUEr98t5nwYNP3P
 l5kzxr237Yj4W9LMUmeZIWo/laiAtgYEeQZAynAZWNxXNBdOvaA=
 =ODX1
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Four fixes:
 * CRYPTO_SHA256 is needed for regdb validation
 * mac80211: mesh path metric was wrong in some frames
 * mac80211: use QoS null-data packets on QoS connections
 * mac80211: tear down RX aggregation sessions first to
   drop fewer packets in HW restart scenarios
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 01:09:42 +09:00
Qu Wenruo eae8d82529 btrfs: Fix wild memory access in compression level parser
[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
 dump_stack+0xe3/0x175
 kasan_report+0x163/0x370
 __asan_load1+0x47/0x50
 strncmp+0x31/0xc0
 btrfs_compress_str2level+0x20/0x70 [btrfs]
 btrfs_parse_options+0xff4/0x1870 [btrfs]
 open_ctree+0x2679/0x49f0 [btrfs]
 btrfs_mount+0x1b7f/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 vfs_kern_mount+0x13/0x20
 btrfs_mount+0x31e/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 do_mount+0xaad/0x1a00
 SyS_mount+0x98/0xe0
 entry_SYSCALL_64_fastpath+0x1f/0xbe
------

[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.

[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ set the default in advance ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-27 17:01:11 +01:00
Andrew Waterman 28dfbe6ed4 RISC-V: Add VDSO entries for clock_get/gettimeofday/getcpu
For now these are just placeholders that execute the syscall.  We will
later optimize them to avoid kernel crossings, but we'd like to have the
VDSO entries from the first released kernel version to make the ABI
simpler.

Signed-off-by: Andrew Waterman <andrew@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-11-27 07:51:39 -08:00
Palmer Dabbelt b7e5a59150 RISC-V: Remove __vdso_cmpxchg{32,64} symbol versions
These were left over from an earlier version of the port.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-11-27 07:51:39 -08:00
David S. Miller 6b56b1a255 Merge branch 'sctp-stream-reconfig-fixes'
Xin Long says:

====================
sctp: a bunch of fixes for stream reconfig

This patchset is to make stream reset and asoc reset work more correctly
for stream reconfig.

Thank to Marcelo making them very clear.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 00:39:09 +09:00
Xin Long 52a395896a sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
When doing asoc reset, if the sender of the response has already sent some
chunk and increased asoc->next_tsn before the duplicate request comes, the
response will use the old result with an incorrect sender next_tsn.

Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after
the sender of the response has performed the asoc reset and before the
peer has confirmed it, and it's value is still asoc->next_tsn original
value minus 1.

This patch sets sender next_tsn for the old result with ctsn_ack_point
plus 1 when processing the duplicate request, to make sure the sender
next_tsn value peer gets will be always right.

Fixes: 692787cef6 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 00:38:45 +09:00