Commit 729204ef49ec("block: relax check on sg gap") allows us to merge
bios, if both are physically contiguous. This change can merge a huge
number of small bios, through mkfs for example, mkfs.ntfs running time
can be decreased to ~1/10.
But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
is non-zero) and if we allow the merge, it is quite difficult to respect
sg gap limit, especially the max segment size, or we risk having an
unaligned virtual boundary. This patch tries to avoid the issue by
disallowing a merge, if the req starts with an unaligned buffer.
Also add comments to explain why the merged segment can't end in
unaligned virt boundary.
Fixes: 729204ef49 ("block: relax check on sg gap")
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Rewrote parts of the commit message and comments.
Signed-off-by: Jens Axboe <axboe@fb.com>
- Allow CPUs to be put back online even if the cpufreq driver is
unable to work with them (eg. due to missing information from
platform firmware), which was the previous behavior expected by
users, but changed in the 4.9 time frame (Chen Yu).
- Fix a few minor issues in the turbostat utility, introduced mostly
during the recent update of it (Len Brown, Doug Smythies).
- Fix a cpupower utility bug causing it to report incorrect values
for turbo frequencies in some cases (Ben Hutchings).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJY8LH9AAoJEILEb/54YlRxUHgQAJUQjhyCfiGzcJ0vpI0pfTgR
MPWlWTtzlSHZ7dlh9iWAaOOjMuGCCylDnZCxo39PXF9EhRfY8xOGqYhhmEyoLtjl
wAI5ikysLnZjCpCgXETiLRpLCR/wa9fX0s8VXY1qDdvTiu2HkmW1BnyB/fHLIsC7
dLxkyQyj9DolLsoHRfkd7V3ACLHvKdOsP9U3ul1lRB4r1esEWP8xTdMWQawS26uc
g4TSUX9ugMTjZwCn3YUa+k+iMs2DNZAo51uNsBR6szaNK5ZHg0UqDsWJZiGPoO3F
tyt4yAPQG97wsuuJ3oMs8A4tWQU97c3HDccSz8+QXd2HtUk90IE8zs9LRQul8D2d
FOd0huAm9LJ1TkUKpgiF5tmga831IXJUDnHqieCyRQBiVlUKxLRyngHclBW8YOft
FmIzfp8HRhaajk67d5qsMhBtWTpnlPhz+2vvp56VzVVdFoed/6TRJNfenUYpojh9
adn9sxpwOW3TJGGBPBw8QX3DAn36aMOmPY+sRM3NXFhaUGsJJJrOU5oJfnMM/RNd
oODV4H5ttjRZbEDE66HaNw4jZv7Gm4yqD6qrT3WGztVNUbQBFPTBju3ExJYU+wmz
Bj5kGKsDyT+/2dkgVcMLz1Ylkl0OGPTRFQ+4mtx8RfwQECrojZmBq24OVzRUfV0b
ZyrH1fAtTAnwUhp6+L7V
=kclf
-----END PGP SIGNATURE-----
Merge tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a cpufreq core regression related to CPU online/offline and
several issues in the turbostat and cpupower utilities.
Specifics:
- Allow CPUs to be put back online even if the cpufreq driver is
unable to work with them (eg. due to missing information from
platform firmware), which was the previous behavior expected by
users, but changed in the 4.9 time frame (Chen Yu).
- Fix a few minor issues in the turbostat utility, introduced mostly
during the recent update of it (Len Brown, Doug Smythies).
- Fix a cpupower utility bug causing it to report incorrect values
for turbo frequencies in some cases (Ben Hutchings)"
* tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
cpufreq: Bring CPUs up even if cpufreq_online() failed
tools/power turbostat: update version number
tools/power turbostat: fix impossibly large CPU%c1 value
tools/power turbostat: turbostat.8 add missing column definitions
tools/power turbostat: update HWP dump to decimal from hex
tools/power turbostat: enable package THERM_INTERRUPT dump
tools/power turbostat: show missing Core and GFX power on SKL and KBL
tools/power turbostat: bugfix: GFXMHz column not changing
- Revert a recent ACPICA commit targeted at catching firmware bugs
which promptly did that and caused functional problems to appear
(Rafael Wysocki).
- Fix a device enumeration problem introduced in the 4.8 time frame
which caused the ACPI docking station driver to report incorrect
status via sysfs among other things (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJY8LI7AAoJEILEb/54YlRxXf8P/R21ZJmIX8V/k3+KUuBC6Elb
09UESbCQIU77dpcXbtBIZwoQt7I6oOza9r39sO/cD/v1nPYT1d3nJkfDe0WnMlus
FwFHYfR/owyxgHnc7qR4XzR29tXMA4fPcbi9Wab5lo7WEc7yXG1UG4c1henhxpdZ
YKqqUbuG8E4lQC8ENQP4oo6LyztJFBi5XSa5GrONGEHy54CAbHdBUw9DdnFAQovQ
Uu2qbodfgNLFZf68n6VuX74nwuxkrlXh44p96C1SduOs1M6N1lrUAofMPu1xQiIG
u5yLYN/tc7btr6l1VFdlQUFHEE62RnF2czyDHIgYoVdfGAK9TIvz45RA//UDqQzi
9s0bKcVcUn9cWJDA6yKtiDXCSqPyuDSdZQgOsG21Oh16eXZma5oDk0KV+pvWPlin
WvbrhYCp69B9Y0fmvsAQAOauPF4mV1RzjEAfo4FgRVAhYZn/TDB2HPE6zHChyKom
gDx0KmlBGO74MYZ4qhuGzCGLdhPWRDTFxK/I1i3sO+cBOB4ct2dz1foB557OSS72
VwLFn8rGSxuWY+Dnu6whP3mB8j+efB2mj3wZBflBOGq4XVWbbj6gSiG8eb7fEBPv
0QNHuoRCZt26I31SziYlB4AhgVAcdpUHy7dFn156UZ9WvVlBK6+DzdYAZFfcg5uc
dWJeYgk+n5tROY4c/zFS
=8T5S
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert a recent ACPICA commit that turned out to be problematic
and fix a device enumeration breakage from the 4.8 cycle.
Specifics:
- Revert a recent ACPICA commit targeted at catching firmware bugs
which promptly did that and caused functional problems to appear
(Rafael Wysocki).
- Fix a device enumeration problem introduced in the 4.8 time frame
which caused the ACPI docking station driver to report incorrect
status via sysfs among other things (Rafael Wysocki)"
* tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
ACPI / scan: Set the visited flag for all enumerated devices
area on x86 to avoid exposing RAM or tripping hardened usercopy.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJY7ncoAAoJEIly9N/cbcAm6o0QAJKhA+/CnTRr/knMv0ZE7EW0
AuP/Hxdxfu/OCIc+BMDApdfme4yGWLjiD2Jx6GNDy9o1FaKCdJ3MaCOlPNlNa/5o
V+n6z2d7CNDpaiNelhUs38JZGK2aSTYC9a0xQ9JEsQnaunwfHUiirZkdL+ajJI4p
4XOlajWq/mvnBetv8EyZRmBSy51HghNQmk+I0OtyerufZCwwOsbKeDcYr2lqxe7R
WtBtvKJF1p55nsNMXG8L62+q4gY5NGtspwQ/7MLrYwmHI9eOdRLzXZdrqH52PvuF
H1sk6xQ4Xl89Fp43akybaGu6UyTPU09r1Y9LSpgxNApvqdDOsqB+zpD7gq3iWX/c
dtORmMOV3JHyATZkDISX8dN/Qx6bXnsfpfempFd/d+YvdOyh8yRw+ZMCy/2Zx1XP
EaEzHMn6DuOGaROhtDGywXylw1CXFzohnfbeCJ2wiQuPWXPDkyFyqmWjwntkP+TD
jzx+M6glP0Vq7UHScLcJ6mvu65UnfMdNSo+/t4mS2Xg2xsyG5maQ4GQoxAbpmW26
uSZIrxSFlq0kffeyoG9l5lnbTKI24pDf9O98ZiyBM2fOytdQ2LBtxbHI9I6DPHYS
u9QQDsETuWj8LPqcy2stp8BNloTIUdbWwIcuCT/MME/s5qpdkRMEwmLQlXVz64zk
BcmDSmhY7ohAaD8dAnlz
=5z+N
-----END PGP SIGNATURE-----
Merge tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CONFIG_STRICT_DEVMEM fix from Kees Cook:
"Fixes /dev/mem to read back zeros for System RAM areas in the 1MB
exception area on x86 to avoid exposing RAM or tripping hardened
usercopy"
* tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mm: Tighten x86 /dev/mem with zeroing reads
virtio pci rework using shared interrupts caused a lot of issues. We
tried to fix them but run out of time. Revert for now, and revisit the
issue for the next kernel.
Luckily we are able to do this without loosing automatic
interrupt NUMA affinity which was the main motivator for the
rework.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJY6/oVAAoJECgfDbjSjVRpiDQH/3WL4zujwShOmEFSaUkka+BK
+Il64oVliZk1BMsMTqLsFYGqJtSlqOkQzWkQ2hyPwS9/U4pBzPZ4eJZCng/245YK
5NsT51/m8x3mjRATh0fPqsAwz8CdkWfMpwLYBS6V73RB1XCTVB4IV9vVk6g922oe
dkKlq6s3XvBqBJD02CkV1ApAYFyozF8ppyWdt7F/MsM9HdpM8uWR9F5fh/qDizbZ
ifPUkTSk8BcFzyZ57P/9rdn+cTpPY4PeKIurKwttCGFRm9++5a6RdIwP+zQm7ypC
LaI9StOj8ixloWjhS2eETMi/qLFkwf93gVFhRWhQzIetkjgqZoRIbcg+iLsi6uU=
=W6NP
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael S. Tsirkin:
"virtio oops fixes
The virtio pci rework using shared interrupts caused a lot of issues.
We tried to fix them but run out of time. Revert for now, and revisit
the issue for the next kernel.
Luckily we are able to do this without loosing automatic interrupt
NUMA affinity which was the main motivator for the rework"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-pci: Remove affinity hint before freeing the interrupt
Revert "virtio_pci: remove struct virtio_pci_vq_info"
Revert "virtio_pci: use shared interrupts for virtqueues"
Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev"
Revert "virtio_pci: simplify MSI-X setup"
Revert "virtio_pci: fix out of bound access for msix_names"
MAINTAINERS: fix virtio file pattern
virtio_console: fix uninitialized variable use
virtio_net: clear MTU when out of range
virtio: allow drivers to validate features
virtio_net: enable big packets for large MTU values
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet.
2) Incorrect event message type for related conntracks created via
ctnetlink, from Liping Zhang.
3) Fix incorrect rcu locking when handling helpers from ctnetlink,
from Gao feng.
4) Fix missing rcu locking when updating helper, from Liping Zhang.
5) Fix missing read_lock_bh when iterating over list of device addresses
from TPROXY and redirect, also from Liping.
6) Fix crash when trying to dump expectations from conntrack with no
helper via ctnetlink, from Liping.
7) Missing RCU protection to expecation list update given ctnetlink
iterates over the list under rcu read lock side, from Liping too.
8) Don't dump autogenerated seed in nft_hash to userspace, this is
very confusing to the user, again from Liping.
9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP,
from Gao feng.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 561eb9d09a ("fbdev: omap/lcd: Make callbacks optional") made
panel callbacks optional but forgot to update check_required_callbacks().
As a result many (all?) OMAP systems using omapfb will crash at boot.
Fix by deleting the whole function.
Fixes: 561eb9d09a ("fbdev: omap/lcd: Make callbacks optional")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
* acpi-scan-fixes:
ACPI / scan: Set the visited flag for all enumerated devices
* acpica-fixes:
Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
* pm-cpufreq-fixes:
cpufreq: Bring CPUs up even if cpufreq_online() failed
* pm-tools-fixes:
cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
tools/power turbostat: update version number
tools/power turbostat: fix impossibly large CPU%c1 value
tools/power turbostat: turbostat.8 add missing column definitions
tools/power turbostat: update HWP dump to decimal from hex
tools/power turbostat: enable package THERM_INTERRUPT dump
tools/power turbostat: show missing Core and GFX power on SKL and KBL
tools/power turbostat: bugfix: GFXMHz column not changing
Merge fixes from Andrew Morton:
"11 fixes.
The presence of 'thp: reduce indentation level in change_huge_pmd()'
is unfortunate. But the patchset had been decently reviewed and tested
before we decided it was needed in -stable and I felt it best not to
churn things at the last minute"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: add Martin Kepplinger's email
zsmalloc: expand class bit
zram: do not use copy_page with non-page aligned address
zram: fix operator precedence to get offset
hugetlbfs: fix offset overflow in hugetlbfs mmap
thp: fix MADV_DONTNEED vs clear soft dirty race
thp: fix MADV_DONTNEED vs. MADV_FREE race
mm: drop unused pmdp_huge_get_and_clear_notify()
thp: fix MADV_DONTNEED vs. numa balancing race
thp: reduce indentation level in change_huge_pmd()
z3fold: fix page locking in z3fold_alloc()
Set the partly deprecated companies' email addresses as alias for the
personal one.
Link: http://lkml.kernel.org/r/1491984622-17321-1-git-send-email-martin.kepplinger@ginzinger.com
Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now 64K page system, zsamlloc has 257 classes so 8 class bit is not
enough. With that, it corrupts the system when zsmalloc stores
65536byte data(ie, index number 256) so that this patch increases class
bit for simple fix for stable backport. We should clean up this mess
soon.
index size
0 32
1 288
..
..
204 52256
256 65536
Fixes: 3783689a1 ("zsmalloc: introduce zspage structure")
Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In zram_rw_page, the logic to get offset is wrong by operator precedence
(i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt
the user's data. This patch fixes it.
Fixes: 8c7f01025 ("zram: implement rw_page operation of zram")
Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start. Offset could be a negative value when
represented as a loff_t. The offset plus length will be used to update
the file size (i_size) which is also a loff_t.
Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.
Found by syzcaller with commit ff8c0c53c4 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied. Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.
To reproduce:
mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
Resulted in,
kernel BUG at mm/hugetlb.c:742!
Call Trace:
hugetlbfs_evict_inode+0x80/0xa0
evict+0x24a/0x620
iput+0x48f/0x8c0
dentry_unlink_inode+0x31f/0x4d0
__dentry_kill+0x292/0x5e0
dput+0x730/0x830
__fput+0x438/0x720
____fput+0x1a/0x20
task_work_run+0xfe/0x180
exit_to_usermode_loop+0x133/0x150
syscall_return_slowpath+0x184/0x1c0
entry_SYSCALL_64_fastpath+0xab/0xad
Fixes: ff8c0c53c4 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yet another instance of the same race.
Fix is identical to change_huge_pmd().
See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details.
Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).
It's critical to not clear pmd intermittently while handling MADV_FREE
to avoid race with MADV_DONTNEED:
CPU0: CPU1:
madvise_free_huge_pmd()
pmdp_huge_get_and_clear_full()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
It results in MADV_DONTNEED skipping the pmd, leaving it not cleared.
It violates MADV_DONTNEED interface and can result is userspace
misbehaviour.
Basically it's the same race as with numa balancing in
change_huge_pmd(), but a bit simpler to mitigate: we don't need to
preserve dirty/young flags here due to MADV_FREE functionality.
[kirill.shutemov@linux.intel.com: Urgh... Power is special again]
Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com
Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.
Let's drop the helper.
Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case prot_numa, we are under down_read(mmap_sem). It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):
CPU0: CPU1:
change_huge_pmd(prot_numa=1)
pmdp_huge_get_and_clear_notify()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.
Found by code analysis, never saw triggered.
Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "thp: fix few MADV_DONTNEED races"
For MADV_DONTNEED to work properly with huge pages, it's critical to not
clear pmd intermittently unless you hold down_write(mmap_sem).
Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
breakage.
See example of such race in commit message of patch 2/4.
All these races are found by code inspection. I haven't seen them
triggered. I don't think it's worth to apply them to stable@.
This patch (of 4):
Restructure code in preparation for a fix.
Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stress testing of the current z3fold implementation on a 8-core system
revealed it was possible that a z3fold page deleted from its unbuddied
list in z3fold_alloc() would be put on another unbuddied list by
z3fold_free() while z3fold_alloc() is still processing it. This has
been introduced with commit 5a27aa822 ("z3fold: add kref refcounting")
due to the removal of special handling of a z3fold page not on any list
in z3fold_free().
To fix this, the z3fold page lock should be taken in z3fold_alloc()
before the pool lock is released. To avoid deadlocking, we just try to
lock the page as soon as we get a hold of it, and if trylock fails, we
drop this page and take the next one.
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ia64 build generates many warnings like this:
WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned.
Besides adding the necessary header this also requires fiddling with
some explicit .S -> .o rules.
Cc: IA64-ML <linux-ia64@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a math error calculating the extra_vecs. The error assumed
only 1 cpu per vector, but the value needs to account for the actual
number of cpus per vector in order to get the correct remainder for
extra CPU assignment.
Fixes: 7bf8222b9b ("irq/affinity: Fix CPU spread for unbalanced nodes")
Reported-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Link: http://lkml.kernel.org/r/1492104492-19943-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Current codes invoke wrongly nf_ct_netns_get in the destroy routine,
it should use nf_ct_netns_put, not nf_ct_netns_get.
It could cause some modules could not be unloaded.
Fixes: ecb2421b5d ("netfilter: add and use nf_ct_netns_get/put")
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This can prevent the nft utility from printing out the auto generated
seed to the user, which is unnecessary and confusing.
Fixes: cb1b69b0b1 ("netfilter: nf_tables: add hash expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Johannes Berg says:
====================
netlink extended ACK reporting
Changes since v4:
* use __NLMSGERR_ATTR_MAX instead of NUM_NLMSGERR_ATTRS
Changes since v3:
* Add NLM_F_CAPPED and NLM_F_ACK_TLVS flags, to allow entirely
stateless parsing of the ACK messages by looking at the new
flags. Need to check NLM_F_ACK_TLVS first, since capping can
be done in kernels before this patchset without setting the
flag.
* Remove "missing_attr" functionality - this can obviously be
added back rather easily, but I'd rather have more discussion
about the nesting problem there.
* Improve documentation of NLMSGERR_ATTR_OFFS
* Improve message structure documentation, documenting that the
request message is always capped for success cases
* fix nlmsg_len of the outer message by calling nlmsg_end()
* fix memcpy() of the request in success cases, going back to
the original code that I'd changed before due to the payload
adjustments that I reverted when introducing tlvlen
Changes since v2:
* add NUM_NLMSGERR_ATTRS, NLMSGERR_ATTR_MAX
* fix cookie length to 20 (sha-1 length)
* move struct members for cookie to patch 3 where they should be
* another cleanup suggested by David Ahern
Changes since v1:
* credit Pablo and Jamal
* incorporate suggestion from David Ahern
* fix compilation in decnet
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have extended error reporting and a new message format for
netlink ACK messages, also extend this to be able to return arbitrary
cookie data on success.
This will allow, for example, nl80211 to not send an extra message for
cookies identifying newly created objects, but return those directly
in the ACK message.
The cookie data size is currently limited to 20 bytes (since Jamal
talked about using SHA1 for identifiers.)
Thanks to Jamal Hadi Salim for bringing up this idea during the
discussions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the extended ACK reporting struct down from generic netlink to
the families, using the existing struct genl_info for simplicity.
Also add support to set the extended ACK information from generic
netlink users.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.
Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When link transitions from LINK_FAIL to LINK_UP, the commit phase is
not called. This leads to an erroneous state causing slave-link state to
get stuck in "going down" state while its speed and duplex are perfectly
fine. This issue is a side-effect of splitting link-set into propose and
commit phases introduced by de77ecd4ef ("bonding: improve link-status
update in mii-monitoring")
This patch fixes these issues by calling commit phase whenever link
state change is proposed.
Fixes: de77ecd4ef ("bonding: improve link-status update in mii-monitoring")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is necessary to provide ethtool support for displaying and
modifying parameters of dwc-xlgmac.
Signed-off-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the serial slave device for the WL1835 Bluetooth interface.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Turns out that the LL protocol and the TI-ST are the same thing AFAICT.
The TI-ST adds firmware loading, GPIO control, and shared access for
NFC, FM radio, etc. For now, we're only implementing what is needed for
BT. This mirrors other drivers like BCM and Intel, but uses the new
serdev bus.
The firmware loading is greatly simplified by using existing
infrastructure to send commands. It may be a bit slower than the
original code using synchronous functions, but the real bottleneck is
likely doing firmware load at 115.2kbps.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are no users of hci_uart_init_tty, so remove it.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add serial slave device binding for the TI WiLink series of Bluetooth/FM/GPS
devices.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
1. Don't get the metric RTAX_ADVMSS of dst.
There are two reasons.
1) Its caller dst_metric_advmss has already invoke dst_metric_advmss
before invoke default_advmss.
2) The ipv4_default_advmss is used to get the default mss, it should
not try to get the metric like ip6_default_advmss.
2. Use sizeof(tcphdr)+sizeof(iphdr) instead of literal 40.
3. Define one new macro IPV4_MAX_PMTU instead of 65535 according to
RFC 2675, section 5.1.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
rtnetlink: Cleanup user notifications for netdev events
Vlad's recent patch to add the event type to rtnetlink notifications
points out a number of redundant or unnecessary notifications sent to
userspace for events that are essentially internal to the kernel. Trim
the list to put a dent in the notification storm.
v2
- rebased to top of net-next with IFLA_EVENT patch reverted
- dropped removal NETDEV_CHANGEINFODATA since it is intentionally
only to send a message to userspace
- dropped NOTIFY_PEERS since Vlad's says it is needed for macvlans
- add patches to remove NETDEV_CHANGEUPPER and NETDEV_CHANGE_TX_QUEUE_LEN
from the event list
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
NETDEV_CHANGEUPPER is an internal event; do not generate userspace
notifications.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CHANGELOWERSTATE is an internal event; do not generate userspace
notifications.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PRECHANGEUPPER is an internal event; do not generate userspace
notifications.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the master device for a link generates many messages; the one
generated for POST_TYPE_CHANGE is redundant:
[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UNKNOWN group default
link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff
[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UNKNOWN group default
link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff
Remove POST_TYPE_CHANGE from the list of notifiers that generate
notifications.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing hardware address generates redundant messages:
[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
Do not send a notification for the CHANGEADDR notifier.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETDEV_UDP_TUNNEL_PUSH_INFO is an internal notifier; nothing userspace
can do so don't generate a netlink notification.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>