Commit Graph

707719 Commits

Author SHA1 Message Date
Eric Biggers 74d4108d9e dm bufio: fix integer overflow when limiting maximum cache size
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression

    (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100

overflows, causing the wrong result to be computed.  For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem).  This causes
severe performance problems for dm-verity users on affected systems.

Fix this by using mult_frac() to correctly multiply by a percentage.  Do
this for all places in dm-bufio that multiply by a percentage.  Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.

Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057 ("dm: add bufio")
Cc: <stable@vger.kernel.org> # v3.2+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16 16:49:57 -05:00
Mike Snitzer 5d47c89f29 dm: clear all discard attributes in queue_limits when discards are disabled
Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the
various discard attributes (which get exposed via sysfs) may be set.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16 16:33:55 -05:00
Mike Snitzer 7dea378b23 dm: do not set 'discards_supported' in targets that do not need it
The DM target's 'discards_supported' flag is intended to act as an
override.  Meaning, even if the underlying storage doesn't support
discards the DM target will.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16 16:33:54 -05:00
Mike Snitzer 8a74d29d54 dm: discard support requires all targets in a table support discards
A DM device with a mix of discard capabilities (due to some underlying
devices not having discard support) _should_ just return -EOPNOTSUPP for
the region of the device that doesn't support discards (even if only by
way of the underlying driver formally not supporting discards).  BUT,
that does ask the underlying driver to handle something that it never
advertised support for.  In doing so we're exposing users to the
potential for a underlying disk driver hanging if/when a discard is
issued a the device that is incapable and never claimed to support
discards.

Fix this by requiring that each DM target in a DM table provide discard
support as a prereq for a DM device to advertise support for discards.

This may cause some configurations that were happily supporting discards
(even in the face of a mix of discard support) to stop supporting
discards -- but the risk of users hitting driver hangs, and forced
reboots, outweighs supporting those fringe mixed discard
configurations.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16 16:33:53 -05:00
Ming Lei 9dc112e2da dm mpath: remove annoying message of 'blk_get_request() returned -11'
It is very normal to see allocation failure, especially with blk-mq
request_queues, so it's unnecessary to report this error and annoy
people.

In practice this 'blk_get_request() returned -11' error gets logged
quite frequently when a blk-mq DM multipath device sees heavy IO.

This change is marked for stable@ because the annoying message in
question was included in stable@ commit 7083abbbf.

Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16 16:29:43 -05:00
Mike Snitzer ef7afb3656 dm cache: lift common migration preparation code to alloc_migration()
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:07 -05:00
Joe Thornber ede6507d67 dm cache: remove usused deferred_cells member from struct cache
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:06 -05:00
Joe Thornber 9768a10dd3 dm cache policy smq: allocate cache blocks in order
Previously, cache blocks were being allocated in reverse order.  Fix
this by pulling the block off the head of the free list.

Shouldn't have any impact on performance or latency but it is more
correct to have the cache blocks allocated/mapped in ascending order.
This fix will slightly increase the chances of two adjacent oblocks
being in adjacent cblocks.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:05 -05:00
Joe Thornber 8ee18ede74 dm cache policy smq: change max background work from 10240 to 4096 blocks
10240 blocks was too much, lowering this reduces the latency of copying
and consumes less memory.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:04 -05:00
Joe Thornber 64748b1645 dm cache background tracker: limit amount of background work that may be issued at once
On large systems the cache policy can be over enthusiastic and queue far
too much dirty data to be written back.  This consumes memory.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:03 -05:00
Joe Thornber deb71918ae dm cache policy smq: take origin idle status into account when queuing writebacks
If the origin device is idle try and writeback more data.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:02 -05:00
Joe Thornber 1e72a8e809 dm cache policy smq: handle races with queuing background_work
The background_tracker holds a set of promotions/demotions that the
cache policy wishes the core target to implement.

When adding a new operation to the tracker it's possible that an
operation on the same block is already present (but in practise this
doesn't appear to be happening).  Catch these situations and do the
appropriate cleanup.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:01 -05:00
Heinz Mauelshagen 2339784490 dm raid: fix panic when attempting to force a raid to sync
Requesting a sync on an active raid device via a table reload
(see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
skips the super_load() call that defines the superblock size
(rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
is called.

Fix by moving the initialization of the superblock start and size
out of super_load() to the caller (analyse_superblocks).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:45:00 -05:00
Mikulas Patocka 95b1369a96 dm integrity: allow unaligned bv_offset
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-integrity checks if bv_offset is aligned on page size and this check
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset check, leaving only the check for
bv_len.

Fixes: 7eada909bf ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:59 -05:00
Mikulas Patocka 0440d5c0ca dm crypt: allow unaligned bv_offset
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).

Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Tested-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:58 -05:00
Mike Snitzer 49de576970 dm: small cleanup in dm_get_md()
Makes dm_get_md() and dm_get_from_kobject() have similar code.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:57 -05:00
Hou Tao b9a41d21dc dm: fix race between dm_get_from_kobject() and __dm_destroy()
The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
     [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
     [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
     [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
     [<ffffffff811de257>] kernfs_seq_show+0x23/0x25
     [<ffffffff81199118>] seq_read+0x16f/0x325
     [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
     [<ffffffff8117b625>] __vfs_read+0x26/0x9d
     [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
     [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
     [<ffffffff8117be9d>] vfs_read+0x8f/0xcf
     [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
     [<ffffffff8117c686>] SyS_read+0x4b/0x76
     [<ffffffff817b606e>] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.

Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:56 -05:00
Mikulas Patocka 856eb0916d dm: allocate struct mapped_device with kvzalloc
The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.

In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.

Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:55 -05:00
Damien Le Moal 114e025968 dm zoned: ignore last smaller runt zone
The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.

Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.

While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().

Reported-by: Peter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:53 -05:00
Jérémy Lefaure fbc61291d7 dm space map metadata: use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:52 -05:00
Ross Zwisler 98d82f48f1 dm log writes: add support for DAX
Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Unfortunately we can't easily track data that has been written via
mmap() now that the dax_flush() abstraction was removed by commit
c3ca015fab ("dax: remove the pmem_dax_ops->flush abstraction").
Otherwise we could just treat each flush as a big write, and store the
data that is being synced to media.  It may be worthwhile to add the
dax_flush() entry point back, just as a notifier so we can do this
logging.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:51 -05:00
Ross Zwisler e5a20660a1 dm log writes: add support for inline data buffers
Currently dm-log-writes supports writing filesystem data via BIOs, and
writing internal metadata from a flat buffer via write_metadata().

For DAX writes, though, we won't have a BIO, but will instead have an
iterator that we'll want to use to fill a flat data buffer.

So, create write_inline_data() which allows us to write filesystem data
using a flat buffer as a source, and wire it up in log_one_block().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:50 -05:00
Mike Snitzer 693b960ea8 dm cache: simplify get_per_bio_data() by removing data_size argument
There is only one per_bio_data size now that writethrough-specific data
was removed from the per_bio_data structure.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:49 -05:00
Mike Snitzer 9958f1d9a0 dm cache: remove all obsolete writethrough-specific code
Now that the writethrough code is much simpler there is no need to track
so much state or cascade bio submission (as was done, via
writethrough_endio(), to issue origin then cache IO in series).

As such the obsolete writethrough list and workqueue is also removed.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:48 -05:00
Mike Snitzer 2df3bae9a6 dm cache: submit writethrough writes in parallel to origin and cache
Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:47 -05:00
Mike Snitzer 8e3c382777 dm cache: pass cache structure to mode functions
No functional changes, just a bit cleaner than passing cache_features
structure.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:42 -05:00
Joe Thornber d1260e2a3f dm cache: fix race condition in the writeback mode overwrite_bio optimisation
When a DM cache in writeback mode moves data between the slow and fast
device it can often avoid a copy if the triggering bio either:

i) covers the whole block (no point copying if we're about to overwrite it)
ii) the migration is a promotion and the origin block is currently discarded

Prior to this fix there was a race with case (ii).  The discard status
was checked with a shared lock held (rather than exclusive).  This meant
another bio could run in parallel and write data to the origin, removing
the discard state.  After the promotion the parallel write would have
been lost.

With this fix the discard status is re-checked once the exclusive lock
has been aquired.  If the block is no longer discarded it falls back to
the slower full copy path.

Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:43:39 -05:00
Elena Reshetova 6bdd079610 dm cache: convert dm_cache_metadata.ref_count from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable dm_cache_metadata.ref_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-10-24 15:09:51 -04:00
Elena Reshetova b0b4d7c675 dm: convert table_device.count from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable table_device.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-10-24 15:09:51 -04:00
Elena Reshetova 2a0b4682e0 dm: convert dm_dev_internal.count from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable dm_dev_internal.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-10-24 15:09:51 -04:00
Linus Torvalds bb176f6709 Linux 4.14-rc6 2017-10-23 06:49:47 -04:00
Linus Torvalds dd9d064e34 Staging/IIO fixes for 4.14-rc6
Here are a small number of patches to resolve some reported IIO and a
 staging driver problem.  Nothing major here, full details are in the
 shortlog below.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWe2ZWg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymZqgCbB3My5ciPpAlZFnuNFYrXZIvT/PoAnR9amLJM
 jwlRLHQPFQ7/Ue2zD9jc
 =aaFo
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO fixes from Greg KH:
 "Here are a small number of patches to resolve some reported IIO and a
  staging driver problem. Nothing major here, full details are in the
  shortlog below.

  All have been in linux-next with no reported issues"

* tag 'staging-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: bcm2835-audio: Fix memory corruption
  iio: adc: at91-sama5d2_adc: fix probe error on missing trigger property
  iio: adc: dln2-adc: fix build error
  iio: dummy: events: Add missing break
  staging: iio: ade7759: fix signed extension bug on shift of a u8
  iio: pressure: zpa2326: Remove always-true check which confuses gcc
  iio: proximity: as3935: noise detection + threshold changes
2017-10-23 06:37:16 -04:00
Linus Torvalds 17e7637f59 Char/Misc driver fixes for 4.14-rc6
Here are 4 small fixes for 4.14-rc6.
 
 3 of them are binder driver fixes for reported issues, and the last one
 is a hyperv driver bugfix.  Nothing major, but good fixes to get into
 4.14-final.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWe2YnQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynN0ACdEfXVOyAcxhfxZzMBGL75sZhoh7IAnjGujCET
 +MxZZKMNFD76V0Zply+2
 =C8Pu
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are four small fixes for 4.14-rc6.

  Three of them are binder driver fixes for reported issues, and the
  last one is a hyperv driver bugfix. Nothing major, but good fixes to
  get into 4.14-final.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  android: binder: Fix null ptr dereference in debug msg
  android: binder: Don't get mm from task
  vmbus: hvsock: add proper sync for vmbus_hvsock_device_unregister()
  binder: call poll_wait() unconditionally.
2017-10-23 06:35:01 -04:00
Linus Torvalds 5805992184 USB/PHY fixes for 4.14-rc6
Here are a small number of USB and PHY driver fixes for 4.14-rc6
 
 There is the usual musb and xhci fixes in here, as well as some needed
 phy patches.  Also is a nasty regression fix for usbfs that has started
 to hit a lot of people using virtual machines.
 
 All of these have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWe2XgA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynFUwCeNkG6DErmFmdpRkz38rjqLDpK6kAAoIT14/S8
 rmDxu0VfFBY2TWIPqyDA
 =7MDm
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
 "Here are a small number of USB and PHY driver fixes for 4.14-rc6

  There is the usual musb and xhci fixes in here, as well as some needed
  phy patches. Also is a nasty regression fix for usbfs that has started
  to hit a lot of people using virtual machines.

  All of these have been in linux-next with no reported problems"

* tag 'usb-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: hub: Allow reset retry for USB2 devices on connect bounce
  USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
  MAINTAINERS: fix git tree url for musb module
  usb: quirks: add quirk for WORLDE MINI MIDI keyboard
  usb: musb: sunxi: Explicitly release USB PHY on exit
  usb: musb: Check for host-mode using is_host_active() on reset interrupt
  usb: musb: musb_cppi41: Configure the number of channels for DA8xx
  usb: musb: musb_cppi41: Fix cppi41_set_dma_mode() for DA8xx
  usb: musb: musb_cppi41: Fix the address of teardown and autoreq registers
  USB: musb: fix late external abort on suspend
  USB: musb: fix session-bit runtime-PM quirk
  usb: cdc_acm: Add quirk for Elatec TWN3
  USB: devio: Revert "USB: devio: Don't corrupt user memory"
  usb: xhci: Handle error condition in xhci_stop_device()
  usb: xhci: Reset halted endpoint if trb is noop
  xhci: Cleanup current_cmd in xhci_cleanup_command_queue()
  xhci: Identify USB 3.1 capable hosts by their port protocol capability
  USB: serial: metro-usb: add MS7820 device id
  phy: rockchip-typec: Check for errors from tcphy_phy_init()
  phy: rockchip-typec: Don't set the aux voltage swing to 400 mV
  ...
2017-10-23 06:33:05 -04:00
Linus Torvalds 02982f8550 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
 "A fix for a broken commit in the previous pull breaking automatic
  module loading of input handlers, such ad evdev"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: do not use property bits when generating module alias
2017-10-22 16:19:12 -04:00
Dmitry Torokhov 09c3e01b25 Input: do not use property bits when generating module alias
The commit 8724ecb072 ("Input: allow matching device IDs on property
bits") started using property bits when generating module aliases for input
handlers, but did not adjust the generation of MODALIAS attribute on input
device uevents, breaking automatic module loading. Given that no handler
currently uses property bits in their module tables, let's revert this part
of the commit for now.

Reported-by: Damien Wyart <damien.wyart@gmail.com>
Tested-by: Damien Wyart <damien.wyart@gmail.com>
Fixes: 8724ecb072 ("Input: allow matching device IDs on property bits")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-10-22 12:49:59 -07:00
Linus Torvalds 936fd00549 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes addressing the following issues:

   - The last polishing for the TLB code, removing the last BUG_ON() and
     the debug file along with tidying up the lazy TLB code.

   - Prevent triple fault on 1st Gen. 486 caused by stupidly calling the
     early IDT setup after the first function which causes a fault which
     should be caught by the exception table.

   - Limit the mmap of /dev/mem to valid addresses

   - Prevent late microcode loading on Broadwell X

   - Remove a redundant assignment in the cache info code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
  x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode"
  x86/mm/64: Remove the last VM_BUG_ON() from the TLB code
  x86/microcode/intel: Disable late loading on model 79
  x86/idt: Initialize early IDT before cr4_init_shadow()
  x86/cpu/intel_cacheinfo: Remove redundant assignment to 'this_leaf'
2017-10-22 06:58:23 -04:00
Linus Torvalds 9e415a8edc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix to make the cs5535 clock event driver robust agaist
  spurious interrupts"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents/drivers/cs5535: Improve resilience to spurious interrupts
2017-10-22 06:56:25 -04:00
Linus Torvalds 5670a8471e Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp/hotplug fix from Thomas Gleixner:
 "The recent rework of the callback invocation missed to cleanup the
  leftovers of the operation, so under certain circumstances a
  subsequent CPU hotplug operation accesses stale data and crashes.
  Clean it up."

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Reset node state after operation
2017-10-22 06:54:42 -04:00
Linus Torvalds 085cf9bfc9 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A series of fixes for perf tooling:

   - Make xyarray return the X/Y size correctly which fixes a crash in
     the exit code.

   - Fix the libc path in test so it works not only on Debian/Ubuntu
     correctly

   - Check for eBPF file existance and output a useful error message
     instead of failing to compile a non existant file

   - Make sure perf_hpp_fmt is not longer references before freeing it

   - Use list_del_init() in the histogram code to prevent a crash when
     the already deleted element is deleted again

   - Remove the leftovers of the removed '-l' option

   - Add reviewer entries to the MAINTAINERS file"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf test shell trace+probe_libc_inet_pton.sh: Be compatible with Debian/Ubuntu
  perf xyarray: Fix wrong processing when closing evsel fd
  perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACE
  perf record: Fix documentation for a inexistent option '-l'
  perf tools: Add long time reviewers to MAINTAINERS
  perf tools: Check wether the eBPF file exists in event parsing
  perf hists: Add extra integrity checks to fmt_free()
  perf hists: Fix crash in perf_hpp__reset_output_field()
2017-10-22 06:52:53 -04:00
Linus Torvalds 4f184d7d84 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of small fixes mostly in the irq drivers area:

   - Make the tango irq chip work correctly, which requires a new
     function in the generiq irq chip implementation

   - A set of updates to the GIC-V3 ITS driver removing a bogus BUG_ON()
     and parsing the VCPU table size correctly"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack()
  irqchip/tango: Use irq_gc_mask_disable_and_ack_set
  genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()
  irqchip/gic-v3-its: Add missing changes to support 52bit physical address
  irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size
  irqchip/gic-v3-its: Fix the incorrect BUG_ON in its_init_vpe_domain()
  DT: arm,gic-v3: Update the ITS size in the examples
2017-10-22 06:42:58 -04:00
Linus Torvalds b8d389e8f3 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
 "Plug a memory leak in the instruction decoder"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix memory leak in decode_instructions()
2017-10-22 06:39:58 -04:00
Linus Torvalds b5ac3beb5a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A little more than usual this time around. Been travelling, so that is
  part of it.

  Anyways, here are the highlights:

   1) Deal with memcontrol races wrt. listener dismantle, from Eric
      Dumazet.

   2) Handle page allocation failures properly in nfp driver, from Jaku
      Kicinski.

   3) Fix memory leaks in macsec, from Sabrina Dubroca.

   4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.

   5) Several fixes in bnxt_en driver, including preventing potential
      NVRAM parameter corruption from Michael Chan.

   6) Fix for KRACK attacks in wireless, from Johannes Berg.

   7) rtnetlink event generation fixes from Xin Long.

   8) Deadlock in mlxsw driver, from Ido Schimmel.

   9) Disallow arithmetic operations on context pointers in bpf, from
      Jakub Kicinski.

  10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
      Xin Long.

  11) Only TCP is supported for sockmap, make that explicit with a
      check, from John Fastabend.

  12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.

  13) Fix panic in packet_getsockopt(), also from Eric Dumazet.

  14) Add missing locked in hv_sock layer, from Dexuan Cui.

  15) Various aquantia bug fixes, including several statistics handling
      cures. From Igor Russkikh et al.

  16) Fix arithmetic overflow in devmap code, from John Fastabend.

  17) Fix busted socket memory accounting when we get a fault in the tcp
      zero copy paths. From Willem de Bruijn.

  18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
  ipv6: flowlabel: do not leave opt->tot_len with garbage
  of_mdio: Fix broken PHY IRQ in case of probe deferral
  textsearch: fix typos in library helpers
  rxrpc: Don't release call mutex on error pointer
  net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
  net: stmmac: Fix stmmac_get_rx_hwtstamp()
  net: stmmac: Add missing call to dev_kfree_skb()
  mlxsw: spectrum_router: Configure TIGCR on init
  mlxsw: reg: Add Tunneling IPinIP General Configuration Register
  net: ethtool: remove error check for legacy setting transceiver type
  soreuseport: fix initialization race
  net: bridge: fix returning of vlan range op errors
  sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
  bpf: add test cases to bpf selftests to cover all access tests
  bpf: fix pattern matches for direct packet access
  bpf: fix off by one for range markings with L{T, E} patterns
  bpf: devmap fix arithmetic overflow in bitmap_size calculation
  net: aquantia: Bad udp rate on default interrupt coalescing
  net: aquantia: Enable coalescing management via ethtool interface
  ...
2017-10-21 22:44:48 -04:00
Bernd Edlinger 8d5f4b0717 stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c967 ("stmmac: review RX/TX ring management")

Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:24:43 +01:00
Eric Dumazet 864e2a1f8a ipv6: flowlabel: do not leave opt->tot_len with garbage
When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.

If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.

Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.

Fix is to properly set tot_len in fl6_merge_options()

At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x358/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:22:24 +01:00
Geert Uytterhoeven 66bdede495 of_mdio: Fix broken PHY IRQ in case of probe deferral
If an Ethernet PHY is initialized before the interrupt controller it is
connected to, a message like the following is printed:

    irq: no irq domain found for /interrupt-controller@e61c0000 !

However, the actual error is ignored, leading to a non-functional (POLL)
PHY interrupt later:

    Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

Depending on whether the PHY driver will fall back to polling, Ethernet
may or may not work.

To fix this:
  1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
     of_irq_get().
     Unlike the former, the latter returns -EPROBE_DEFER if the
     interrupt controller is not yet available, so this condition can be
     detected.
     Other errors are handled the same as before, i.e. use the passed
     mdio->irq[addr] as interrupt.
  2. Propagate and handle errors from of_mdiobus_register_phy() and
     of_mdiobus_register_device().

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:20:25 +01:00
Randy Dunlap 7433a8d6fa textsearch: fix typos in library helpers
Fix spellos (typos) in textsearch library helpers.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:14:07 +01:00
David Howells 6cb3ece968 rxrpc: Don't release call mutex on error pointer
Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
call pointer actually holds an error value.

Fixes: 540b1c48c3 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:05:39 +01:00
David S. Miller 748759d57e Merge branch 'stmmac-hw-tstamp-fixes'
Jose Abreu says:

====================
net: stmmac: Fix HW timestamping

Three fixes for HW timestamping feature, all of them for RX side.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:50:40 +01:00
Jose Abreu 9454360dec net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
Prevent infinite loop by correctly setting the loop condition to
break when i == 10.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:50:40 +01:00