mirror of https://gitee.com/openkylin/linux.git
8657 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Dave Airlie | 8c86394470 |
Linux 3.18
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhNLZAAoJEHm+PkMAQRiGAEcH/iclYDW7k2GKemMqboy+Ohmh +ELbQothNhlGZlS1wWdD69LBiiXkkQ+ufVYFh/hC0oy0gUdfPMt5t+bOHy6cjn6w 9zOcACtpDKnqbOwRqXZjZgNmIabk7lRjbn7GK4GQqpIaW4oO0FWcT91FFhtGSPDa tjtmGRqDmbNsqfzr18h0WPEpUZmT6MxIdv17AYDliPB1MaaRuAv1Kss05TJrXdfL Oucv+C0uwnybD9UWAz6pLJ3H/HR9VJFdkaJ4Y0pbCHAuxdd1+swoTpicluHlsJA1 EkK5iWQRMpcmGwKvB0unCAQljNpaJiq4/Tlmmv8JlYpMlmIiVLT0D8BZx5q05QQ= =oGNw -----END PGP SIGNATURE----- Merge tag 'v3.18' into drm-next Linux 3.18 Backmerge Linus tree into -next as we had conflicts in i915/radeon/nouveau, and everyone was solving them individually. * tag 'v3.18': (57 commits) Linux 3.18 watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7 uapi: fix to export linux/vm_sockets.h i2c: cadence: Set the hardware time-out register to maximum value i2c: davinci: generate STP always when NACK is received ahci: disable MSI on SAMSUNG 0xa800 SSD context_tracking: Restore previous state in schedule_user slab: fix nodeid bounds check for non-contiguous node IDs lib/genalloc.c: export devm_gen_pool_create() for modules mm: fix anon_vma_clone() error treatment mm: fix swapoff hang after page migration and fork fat: fix oops on corrupted vfat fs ipc/sem.c: fully initialize sem_array before making it visible drivers/input/evdev.c: don't kfree() a vmalloc address cxgb4: Fill in supported link mode for SFP modules xen-netfront: Remove BUGs on paged skb data which crosses a page boundary mm/vmpressure.c: fix race in vmpressure_work_fn() mm: frontswap: invalidate expired data on a dup-store failure mm: do not overwrite reserved pages counter at show_mem() drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 ... Conflicts: drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/nouveau/nouveau_drm.c drivers/gpu/drm/radeon/radeon_cs.c |
|
Paul Mackerras | 7c3fbbdd04 |
slab: fix nodeid bounds check for non-contiguous node IDs
The bounds check for nodeid in ____cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading to
a panic at boot time. For example, on a POWER8 machine the node IDs are
typically 0, 1, 16 and 17. This means that num_online_nodes() returns
4, so when ____cache_alloc_node is called with nodeid = 16 the VM_BUG_ON
triggers, like this:
kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Call Trace:
.____cache_alloc_node+0x5c/0x270 (unreliable)
.kmem_cache_alloc_node_trace+0xdc/0x360
.init_list+0x3c/0x128
.kmem_cache_init+0x1dc/0x258
.start_kernel+0x2a0/0x568
start_here_common+0x20/0xa8
To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int). The
check is there mainly to protect the array dereference in the get_node()
call in the next line, and the array being dereferenced is of size
MAX_NUMNODES. If the nodeid is in range but invalid (for example if the
node is off-line), the BUG_ON in the next line will catch that.
Fixes:
|
|
Daniel Forrest | c4ea95d7cd |
mm: fix anon_vma_clone() error treatment
Andrew Morton noticed that the error return from anon_vma_clone() was
being dropped and replaced with -ENOMEM (which is not itself a bug
because the only error return value from anon_vma_clone() is -ENOMEM).
I did an audit of callers of anon_vma_clone() and discovered an actual
bug where the error return was being lost. In __split_vma(), between
Linux 3.11 and 3.12 the code was changed so the err variable is used
before the call to anon_vma_clone() and the default initial value of
-ENOMEM is overwritten. So a failure of anon_vma_clone() will return
success since err at this point is now zero.
Below is a patch which fixes this bug and also propagates the error
return value from anon_vma_clone() in all cases.
Fixes:
|
|
Hugh Dickins | 2022b4d18a |
mm: fix swapoff hang after page migration and fork
I've been seeing swapoff hangs in recent testing: it's cycling around trying unsuccessfully to find an mm for some remaining pages of swap. I have been exercising swap and page migration more heavily recently, and now notice a long-standing error in copy_one_pte(): it's trying to add dst_mm to swapoff's mmlist when it finds a swap entry, but is doing so even when it's a migration entry or an hwpoison entry. Which wouldn't matter much, except it adds dst_mm next to src_mm, assuming src_mm is already on the mmlist: which may not be so. Then if pages are later swapped out from dst_mm, swapoff won't be able to find where to replace them. There's already a !non_swap_entry() test for stats: move that up before the swap_duplicate() and the addition to mmlist. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Kelley Nielsen <kelleynnn@gmail.com> Cc: <stable@vger.kernel.org> [2.6.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrew Morton | 91b57191cf |
mm/vmpressure.c: fix race in vmpressure_work_fn()
In some android devices, there will be a "divide by zero" exception. vmpr->scanned could be zero before spin_lock(&vmpr->sr_lock). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=88051 [akpm@linux-foundation.org: neaten] Reported-by: ji_ang <ji_ang@163.com> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Weijie Yang | fb993fa1a2 |
mm: frontswap: invalidate expired data on a dup-store failure
If a frontswap dup-store failed, it should invalidate the expired page in the backend, or it could trigger some data corruption issue. Such as: 1. use zswap as the frontswap backend with writeback feature 2. store a swap page(version_1) to entry A, success 3. dup-store a newer page(version_2) to the same entry A, fail 4. use __swap_writepage() write version_2 page to swapfile, success 5. zswap do shrink, writeback version_1 page to swapfile 6. version_2 page is overwrited by version_1, data corrupt. This patch fixes this issue by invalidating expired data immediately when meet a dup-store failure. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Dave Airlie | e8115e79aa |
Linux 3.18-rc7
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUe7l9AAoJEHm+PkMAQRiGkGcIAIryQ7NKn4IaxUtS807Lx4Ih obEnx7nNKZTXCZpD/7XQGHMMJyozMJR50PHZESJoHu4Luhv9h7EFRnyJ6MdqMlwn zla3zY0yRsHwPoJKcHbSE0CPHZz0WPQHj7IEbM+XJz2tMNJfbgTrezElmcCM4DRp c9ae+ggwZ2cyNYM0r2RSwSJ525WMh69f9dzSUE27fpvkllQgwqNs/jHYz8HNOEht FWcv5UhvzKjwJS3awULfOB3zH2QdFvVTrwAzd+kbV2Q6T6CaUoFRlhXeKUO6W2Jv pJM6oj8tMZUkdXEv7EQXT1kwEqC4DULTTTHs4tSF79O1ESmNfePiOwwBcwoM2nM= =kG1Y -----END PGP SIGNATURE----- Merge tag 'v3.18-rc7' into drm-next This fixes a bunch of conflicts prior to merging i915 tree. Linux 3.18-rc7 Conflicts: drivers/gpu/drm/exynos/exynos_drm_drv.c drivers/gpu/drm/i915/i915_drv.c drivers/gpu/drm/i915/intel_pm.c drivers/gpu/drm/tegra/dc.c |
|
Linus Torvalds | 3865efcb14 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro: "Fix for a really embarrassing braino in iov_iter. Kudos to paulus..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Fix thinko in iov_iter_single_seg_count |
|
Tang Chen | 0bd8542008 |
mem-hotplug: reset node present pages when hot-adding a new pgdat
When memory is hot-added, all the memory is in offline state. So clear all zones' present_pages because they will be updated in online_pages() and offline_pages(). Otherwise, /proc/zoneinfo will corrupt: When the memory of node2 is offline: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 8388608 managed 0 When we online memory on node2: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 16777216 managed 8388608 Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tang Chen | f784a3f196 |
mem-hotplug: reset node managed pages when hot-adding a new pgdat
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 57cbc87e03 |
mm/debug-pagealloc: correct freepage accounting and order resetting
One thing I did in this patch is fixing freepage accounting. If we clear guard page and link it onto isolate buddy list, we should not increase freepage count. This patch adds conditional branch to skip counting in this case. Without this patch, this overcounting happens frequently if guard order is set and CMA is used. Another thing fixed in this patch is the target to reset order. In __free_one_page(), we check the buddy page whether it is a guard page or not. And, if so, we should clear guard attribute on the buddy page and reset order of it to 0. But, current code resets original page's order rather than buddy one's. Maybe, this doesn't have any problem, because whole merged page's order will be re-assigned soon. But, it is better to correct code. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Vlastimil Babka | 1d5bfe1ffb |
mm, compaction: prevent infinite loop in compact_zone
Several people have reported occasionally seeing processes stuck in compact_zone(), even triggering soft lockups, in 3.18-rc2+. Testing a revert of commit |
|
Michal Nazarewicz | dae803e165 |
mm: alloc_contig_range: demote pages busy message from warn to info
Having test_pages_isolated failure message as a warning confuses users into thinking that it is more serious than it really is. In reality, if called via CMA, allocation will be retried so a single test_pages_isolated failure does not prevent allocation from succeeding. Demote the warning message to an info message and reformat it such that the text "failed" does not appear and instead a less worrying "PFNS busy" is used. This message is trivially reproducible on a 10GB x86 machine on 3.16.y kernels configured with CONFIG_DMA_CMA. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 95069ac8da |
mm/slab: fix unalignment problem on Malta with EVA due to slab merge
Unlike SLUB, sometimes, object isn't started at the beginning of the
slab in SLAB. This causes the unalignment problem after slab merging is
supported by commit
|
|
Joonsoo Kim | 3c605096d3 |
mm/page_alloc: restrict max order of merging on isolated pageblock
Current pageblock isolation logic could isolate each pageblock individually. This causes freepage accounting problem if freepage with pageblock order on isolate pageblock is merged with other freepage on normal pageblock. We can prevent merging by restricting max order of merging to pageblock order if freepage is on isolate pageblock. A side-effect of this change is that there could be non-merged buddy freepage even if finishing pageblock isolation, because undoing pageblock isolation is just to move freepage from isolate buddy list to normal buddy list rather than to consider merging. So, the patch also makes undoing pageblock isolation consider freepage merge. When un-isolation, freepage with more than pageblock order and it's buddy are checked. If they are on normal pageblock, instead of just moving, we isolate the freepage and free it in order to get merged. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 8f82b55dd5 |
mm/page_alloc: move freepage counting logic to __free_one_page()
All the caller of __free_one_page() has similar freepage counting logic, so we can move it to __free_one_page(). This reduce line of code and help future maintenance. This is also preparation step for "mm/page_alloc: restrict max order of merging on isolated pageblock" which fix the freepage counting problem on freepage with more than pageblock order. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 51bb1a4093 |
mm/page_alloc: add freepage on isolate pageblock to correct buddy list
In free_pcppages_bulk(), we use cached migratetype of freepage to determine type of buddy list where freepage will be added. This information is stored when freepage is added to pcp list, so if isolation of pageblock of this freepage begins after storing, this cached information could be stale. In other words, it has original migratetype rather than MIGRATE_ISOLATE. There are two problems caused by this stale information. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage will be added to normal buddy list so that it could be allocated without any restriction. And the other problem is incorrect freepage accounting. Freepages on isolate pageblock should not be counted for number of freepage. Following is the code snippet in free_pcppages_bulk(). /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); if (likely(!is_migrate_isolate_page(page))) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); } As you can see above snippet, current code already handle second problem, incorrect freepage accounting, by re-fetching pageblock migratetype through is_migrate_isolate_page(page). But, because this re-fetched information isn't used for __free_one_page(), first problem would not be solved. This patch try to solve this situation to re-fetch pageblock migratetype before __free_one_page() and to use it for __free_one_page(). In addition to move up position of this re-fetch, this patch use optimization technique, re-fetching migratetype only if there is isolate pageblock. Pageblock isolation is rare event, so we can avoid re-fetching in common case with this optimization. This patch also correct migratetype of the tracepoint output. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | ad53f92eb4 |
mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
Before describing bugs itself, I first explain definition of freepage. 1. pages on buddy list are counted as freepage. 2. pages on isolate migratetype buddy list are *not* counted as freepage. 3. pages on cma buddy list are counted as CMA freepage, too. Now, I describe problems and related patch. Patch 1: There is race conditions on getting pageblock migratetype that it results in misplacement of freepages on buddy list, incorrect freepage count and un-availability of freepage. Patch 2: Freepages on pcp list could have stale cached information to determine migratetype of buddy list to go. This causes misplacement of freepages on buddy list and incorrect freepage count. Patch 4: Merging between freepages on different migratetype of pageblocks will cause freepages accouting problem. This patch fixes it. Without patchset [3], above problem doesn't happens on my CMA allocation test, because CMA reserved pages aren't used at all. So there is no chance for above race. With patchset [3], I did simple CMA allocation test and get below result: - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation - run kernel build (make -j16) on background - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval - Result: more than 5000 freepage count are missed With patchset [3] and this patchset, I found that no freepage count are missed so that I conclude that problems are solved. On my simple memory offlining test, these problems also occur on that environment, too. This patch (of 4): There are two paths to reach core free function of buddy allocator, __free_one_page(), one is free_one_page()->__free_one_page() and the other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page(). Each paths has race condition causing serious problems. At first, this patch is focused on first type of freepath. And then, following patch will solve the problem in second type of freepath. In the first type of freepath, we got migratetype of freeing page without holding the zone lock, so it could be racy. There are two cases of this race. 1. pages are added to isolate buddy list after restoring orignal migratetype CPU1 CPU2 get migratetype => return MIGRATE_ISOLATE call free_one_page() with MIGRATE_ISOLATE grab the zone lock unisolate pageblock release the zone lock grab the zone lock call __free_one_page() with MIGRATE_ISOLATE freepage go into isolate buddy list, although pageblock is already unisolated This may cause two problems. One is that we can't use this page anymore until next isolation attempt of this pageblock, because freepage is on isolate buddy list. The other is that freepage accouting could be wrong due to merging between different buddy list. Freepages on isolate buddy list aren't counted as freepage, but ones on normal buddy list are counted as freepage. If merge happens, buddy freepage on normal buddy list is inevitably moved to isolate buddy list without any consideration of freepage accouting so it could be incorrect. 2. pages are added to normal buddy list while pageblock is isolated. It is similar with above case. This also may cause two problems. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage would be added to normal buddy list so that it could be allocated without any restriction. And the other problem is same as case 1, that it, incorrect freepage accouting. This race condition would be prevented by checking migratetype again with holding the zone lock. Because it is somewhat heavy operation and it isn't needed in common case, we want to avoid rechecking as much as possible. So this patch introduce new variable, nr_isolate_pageblock in struct zone to check if there is isolated pageblock. With this, we can avoid to re-check migratetype in common case and do it only if there is isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above mentioned problems. Changes from v3: Add one more check in free_one_page() that checks whether migratetype is MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 5842001630 |
mm/compaction: skip the range until proper target pageblock is met
Commit
|
|
Paul Mackerras | ad0eab9293 |
Fix thinko in iov_iter_single_seg_count
The branches of the if (i->type & ITER_BVEC) statement in iov_iter_single_seg_count() are the wrong way around; if ITER_BVEC is clear then we use i->bvec, when we should be using i->iov. This fixes it. In my case, the symptom that this caused was that a KVM guest doing filesystem operations on a virtual disk would result in one of qemu's threads on the host going into an infinite loop in generic_perform_write(). The loop would hit the copied == 0 case and call iov_iter_single_seg_count() to reduce the number of bytes to try to process, but because of the error, iov_iter_single_seg_count() would just return i->count and the loop made no progress and continued forever. Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Linus Torvalds | 661b99e95f |
xfs: fixes for v3.18-rc3
This update fixes: - incorrect warnings about i_mutex locking in pagecache_isize_extended() and updates comments to match expected locking - another zero-range bug fix for stray file size updates - a bunch of fixes for regression in the bulkstat code introduced in 3.17. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJUXTquAAoJEK3oKUf0dfodzPQP+wVWm3suRpvfpOeljlwcCB1w r1MjJjEgKRmlRCwTe4IYSSS2ZBSz3f5qTunde1PEAcUyyf2gO5b/gV2WCaVDWIpV 0/1RDaXIRplTvY/i5UAOtqOSUpNwvWz7PCmCAR7RCHFfTyBBbFRlRdg1GPCrBAzv rf/kRm9C6fRHHwLojwNCLEzA0MAdjKVG05A+Xv3MnWcd9fRxtNryQnhTIUJoDCVl 5keebmizquoc88WoQxDX29j2Ce+yjMPj27YhB9Z09mfmFvbLHT46UP2jI5Ty+DX5 rJikXA5Jv6wtig9wsbsenK7fsy1CyAAbmxS8vjyHsdCSAMpN98HkndsSadF8nH5U sEh43OJjJS5LedVNqfV+LtI9ZD9+fGpAETQFVI8TpefFBX2aYw3fomWO0AUbRuMi s4f6iz2sDvgDa+oE38XqKif6CqFsBX0QQSuCDiPaMi79vy2VLE/I5SU7QAj42+BW sEyGVVcdJDsDpkUe6SicIpbNNwhXCR4GYc4jI4QYjDdcK6rrliCnsI8hHk5pPeKk Qvt6ERiP5dLvp9f6KEzvPAkdxBmDKOtHKMSEMJzBBDtxFJXStJNuYZVtMYb8JwJq nV0WKSoXR0xT/IeMw8336HF4GO04VCHjY3QnNh0qNdHYtfXJerZmpzVhae7dJmZZ nLFimb3q2TlfgEvkPqmi =8S0c -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This update fixes a warning in the new pagecache_isize_extended() and updates some related comments, another fix for zero-range misbehaviour, and an unforntuately large set of fixes for regressions in the bulkstat code. The bulkstat fixes are large but necessary. I wouldn't normally push such a rework for a -rcX update, but right now xfsdump can silently create incomplete dumps on 3.17 and it's possible that even xfsrestore won't notice that the dumps were incomplete. Hence we need to get this update into 3.17-stable kernels ASAP. In more detail, the refactoring work I committed in 3.17 has exposed a major hole in our QA coverage. With both xfsdump (the major user of bulkstat) and xfsrestore silently ignoring missing files in the dump/restore process, incomplete dumps were going unnoticed if they were being triggered. Many of the dump/restore filesets were so small that they didn't evenhave a chance of triggering the loop iteration bugs we introduced in 3.17, so we didn't exercise the code sufficiently, either. We have already taken steps to improve QA coverage in xfstests to avoid this happening again, and I've done a lot of manual verification of dump/restore on very large data sets (tens of millions of inodes) of the past week to verify this patch set results in bulkstat behaving the same way as it does on 3.16. Unfortunately, the fixes are not exactly simple - in tracking down the problem historic API warts were discovered (e.g xfsdump has been working around a 20 year old bug in the bulkstat API for the past 10 years) and so that complicated the process of diagnosing and fixing the problems. i.e. we had to fix bugs in the code as well as discover and re-introduce the userspace visible API bugs that we unwittingly "fixed" in 3.17 that xfsdump relied on to work correctly. Summary: - incorrect warnings about i_mutex locking in pagecache_isize_extended() and updates comments to match expected locking - another zero-range bug fix for stray file size updates - a bunch of fixes for regression in the bulkstat code introduced in 3.17" * tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: track bulkstat progress by agino xfs: bulkstat error handling is broken xfs: bulkstat main loop logic is a mess xfs: bulkstat chunk-formatter has issues xfs: bulkstat chunk formatting cursor is broken xfs: bulkstat btree walk doesn't terminate mm: Fix comment before truncate_setsize() xfs: rework zero range to prevent invalid i_size updates mm: Remove false WARN_ON from pagecache_isize_extended() xfs: Check error during inode btree iteration in xfs_bulkstat() xfs: bulkstat doesn't release AGI buffer on error |
|
Jan Kara | 77783d0642 |
mm: Fix comment before truncate_setsize()
XFS doesn't always hold i_mutex when calling truncate_setsize() and it uses a different lock to serialize truncates and writes. So fix the comment before truncate_setsize(). Reported-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com> |
|
Linus Torvalds | f3ed88a6bc |
Merge branch 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and DMA-mapping fixes from Marek Szyprowski: "This contains important fixes for recently introduced highmem support for default contiguous memory region used for dma-mapping subsystem" * 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: mm, cma: make parameters order consistent in func declaration and definition mm: cma: Use %pa to print physical addresses mm: cma: Ensure that reservations never cross the low/high mem boundary mm: cma: Always consider a 0 base address reservation as dynamic mm: cma: Don't crash on allocation if CMA area can't be activated |
|
Jan Kara | f55fefd1a5 |
mm: Remove false WARN_ON from pagecache_isize_extended()
The WARN_ON checking whether i_mutex is held in pagecache_isize_extended() was wrong because some filesystems (e.g. XFS) use different locks for serialization of truncates / writes. So just remove the check. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> |
|
Konstantin Khlebnikov | 4d88e6f7d5 |
mm/balloon_compaction: fix deflation when compaction is disabled
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.
Fixes:
|
|
Mikulas Patocka | 8aba7e0a2c |
mm/slab_common: don't check for duplicate cache names
The SLUB cache merges caches with the same size and alignment and there was long standing bug with this behavior: - create the cache named "foo" - create the cache named "bar" (which is merged with "foo") - delete the cache named "foo" (but it stays allocated because "bar" uses it) - create the cache named "foo" again - it fails because the name "foo" is already used That bug was fixed in commit |
|
Johannes Weiner | 8186eb6a79 |
mm: rmap: split out page_remove_file_rmap()
page_remove_rmap() has too many branches on PageAnon() and is hard to follow. Move the file part into a separate function. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Johannes Weiner | d7365e783e |
mm: memcontrol: fix missed end-writeback page accounting
Commit
|
|
Johannes Weiner | 3a3c02ecf7 |
mm: page-writeback: inline account_page_dirtied() into single caller
A follow-up patch would have changed the call signature. To save the trouble, just fold it instead. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: <stable@vger.kernel.org> [3.17.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Yasuaki Ishimatsu | 35dca71c1f |
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
When hot adding the same memory after hot removal, the following messages are shown: WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426() ... Call Trace: dump_stack+0x46/0x58 warn_slowpath_common+0x81/0xa0 warn_slowpath_null+0x1a/0x20 free_area_init_node+0x3fe/0x426 hotadd_new_pgdat+0x90/0x110 add_memory+0xd4/0x200 acpi_memory_device_add+0x1aa/0x289 acpi_bus_attach+0xfd/0x204 acpi_bus_attach+0x178/0x204 acpi_bus_scan+0x6a/0x90 acpi_device_hotplug+0xe8/0x418 acpi_hotplug_work_fn+0x1f/0x2b process_one_work+0x14e/0x3f0 worker_thread+0x11b/0x510 kthread+0xe1/0x100 ret_from_fork+0x7c/0xb0 The detaled explanation is as follows: When hot removing memory, pgdat is set to 0 in try_offline_node(). But if the pgdat is allocated by bootmem allocator, the clearing step is skipped. And when hot adding the same memory, the uninitialized pgdat is reused. But free_area_init_node() checks wether pgdat is set to zero. As a result, free_area_init_node() hits WARN_ON(). This patch clears pgdat which is allocated by bootmem allocator in try_offline_node(). Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
David Rientjes | 6d50e60cd2 |
mm, thp: fix collapsing of hugepages on madvise
If an anonymous mapping is not allowed to fault thp memory and then madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never collapse this memory into thp memory. This occurs because the madvise(2) handler for thp, hugepage_madvise(), clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags until the final action of madvise_behavior(). This causes the khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when the vma had previously had VM_NOHUGEPAGE set. Fix this by passing the correct vma flags to the khugepaged mm slot handler. There's no chance khugepaged can run on this vma until after madvise_behavior() returns since we hold mm->mmap_sem. It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags in hugepage_advise(), but I didn't want to introduce special case behavior into madvise_behavior(). I think it's best to just let it always set vma->vm_flags itself. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Suleiman Souhlal <suleiman@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Yu Zhao | 5ddacbe92b |
mm: free compound page with correct order
Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.
The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.
This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.
$ cat /proc/vmstat | grep thp_zero_page_alloc
thp_zero_page_alloc 55
thp_zero_page_alloc_failed 0
This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.
Fixes:
|
|
Joonsoo Kim | 6ea41c0c0a |
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
Commit
|
|
Wang Nan | 401507d67d |
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
Commit |
|
Will Deacon | ce9ec37bdd |
zap_pte_range: update addr when forcing flush after TLB batching faiure
When unmapping a range of pages in zap_pte_range, the page being unmapped is added to an mmu_gather_batch structure for asynchronous freeing. If we run out of space in the batch structure before the range has been completely unmapped, then we break out of the loop, force a TLB flush and free the pages that we have batched so far. If there are further pages to unmap, then we resume the loop where we left off. Unfortunately, we forget to update addr when we break out of the loop, which causes us to truncate the range being invalidated as the end address is exclusive. When we re-enter the loop at the same address, the page has already been freed and the pte_present test will fail, meaning that we do not reconsider the address for invalidation. This patch fixes the problem by incrementing addr by the PAGE_SIZE before breaking out of the loop on batch failure. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Laurent Pinchart | 56fa4f609b |
mm: cma: Use %pa to print physical addresses
Casting physical addresses to unsigned long and using %lu truncates the values on systems where physical addresses are larger than 32 bits. Use %pa and get rid of the cast instead. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> |
|
Laurent Pinchart | 16195ddd4e |
mm: cma: Ensure that reservations never cross the low/high mem boundary
Commit |
|
Laurent Pinchart | 800a85d3d2 |
mm: cma: Always consider a 0 base address reservation as dynamic
The fixed parameter to cma_declare_contiguous() tells the function whether the given base address must be honoured or should be considered as a hint only. The API considers a zero base address as meaning any base address, which must never be considered as a fixed value. Part of the implementation correctly checks both fixed and base != 0, but two locations check the fixed value only. Set fixed to false when base is 0 to fix that and simplify the code. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> |
|
Laurent Pinchart | f022d8cb7e |
mm: cma: Don't crash on allocation if CMA area can't be activated
If activation of the CMA area fails its mutex won't be initialized, leading to an oops at allocation time when trying to lock the mutex. Fix this by setting the cma area count field to 0 when activation fails, leading to allocation returning NULL immediately. Cc: <stable@vger.kernel.org> # v3.17 Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> |
|
Linus Torvalds | d1e14f1d63 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro: "overlayfs merge + leak fix for d_splice_alias() failure exits" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: overlayfs: embed middle into overlay_readdir_data overlayfs: embed root into overlay_readdir_data overlayfs: make ovl_cache_entry->name an array instead of pointer overlayfs: don't hold ->i_mutex over opening the real directory fix inode leaks on d_splice_alias() failure exits fs: limit filesystem stacking depth overlay: overlay filesystem documentation overlayfs: implement show_options overlayfs: add statfs support overlay filesystem shmem: support RENAME_WHITEOUT ext4: support RENAME_WHITEOUT vfs: add RENAME_WHITEOUT vfs: add whiteout support vfs: export check_sticky() vfs: introduce clone_private_mount() vfs: export __inode_permission() to modules vfs: export do_splice_direct() to modules vfs: add i_op->dentry_open() |
|
Linus Torvalds | 1c45d9a920 |
ACPI and power management updates for 3.18-rc2
- Fix for a recent PCI power management change that overlooked the fact that some IRQ chips might not be able to configure PCIe PME for system wakeup from Lucas Stach. - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is called with a wrong ordering of arguments from Zhang Rui. - A bunch of intel_pstate driver fixes (all -stable candidates) from Dirk Brandewie, Gabriele Mazzotta and Pali Rohár. - Fixes for a rather long-standing problem with the OOM killer and the freezer that frozen processes killed by the OOM do not actually release any memory until they are thawed, so OOM-killing them is rather pointless, with a couple of cleanups on top (Michal Hocko, Cong Wang, Rafael J Wysocki). - ACPICA update to upstream release 20140926, inlcuding mostly cleanups reducing differences between the upstream ACPICA and the kernel code, tools changes (acpidump, acpiexec) and support for the _DDN object (Bob Moore, Lv Zheng). - New PM QoS class for memory bandwidth from Tomeu Vizoso. - Default 32-bit DMA mask for platform devices enumerated by ACPI (this change is mostly needed for some drivers development in progress targeted at 3.19) from Heikki Krogerus. - ACPI EC driver cleanups, mostly related to debugging, from Lv Zheng. - cpufreq-dt driver updates from Thomas Petazzoni. - powernv cpuidle driver update from Preeti U Murthy. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUSjZFAAoJEILEb/54YlRxyfIP/irc/f7DDb0mElF755ANtSXp CTVIQSn6uZ2P//ElQO0+nckZSo39jrBkHVu11vDxmVt2PJE2VBgNjHJLyf1boaPI 9aR5kzVmL6jzJ9wA3gYqr91uCVegY1KDFx2KrAlrNomrlc2xtTGf6F17I4tI9qHL pgc8jhJZ1swn4wL0qnqffLsmx3Hoq3uIO5PNAXD+qUSgm5+8zZwLLlvnrM8upOO4 cHTvxh+ZwXrak4RO4NciYZPKJQAD47MTcJCDR/bg7MKxeiJPrzLrR+WrbCYr5md1 iSiVThZDZnnYTiDLPiemcXoe3jpG2bigXncxJVRDJ7MBOO7ZX7mppwdNnMaNM5kN 92kvLOy269NSS2SFJ0N/B6Xr1jQ0HEdwj7erl4xJIkobKRuvN9fYyVWkoL9i3sj4 OQ7fqhXoEON9CW0KwC5FRAswIungB//o5OjN7VlNKTBKfPdWAjgVQOyeeZ+gSoQo 9tbR/QEEEcHn8fiQpBM9cQw2NL0Rx1ZzHXs7dB0U6ynfG5Drge4OTTwl/Gm4mavB 8Tv3ji26VvQdFr+It2SsijjjjjzVIsdK5iUpSHYo876u4l20CEH3gSpVA/jNhgH6 HaAN5DYIot4Qq5ifjDydRT6WGIyxsVMk3SqehjF47TDaX4l1FbSYWGVyKxfjnQs3 2rWJ3yuDjH28Cfmi0MO0 =4Q8f -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "This is material that didn't make it to my 3.18-rc1 pull request for various reasons, mostly related to timing and travel (LinuxCon EU / LPC) plus a couple of fixes for recent bugs. The only really new thing here is the PM QoS class for memory bandwidth, but it is simple enough and users of it will be added in the next cycle. One major change in behavior is that platform devices enumerated by ACPI will use 32-bit DMA mask by default. Also included is an ACPICA update to a new upstream release, but that's mostly cleanups, changes in tools and similar. The rest is fixes and cleanups mostly. Specifics: - Fix for a recent PCI power management change that overlooked the fact that some IRQ chips might not be able to configure PCIe PME for system wakeup from Lucas Stach. - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is called with a wrong ordering of arguments from Zhang Rui. - A bunch of intel_pstate driver fixes (all -stable candidates) from Dirk Brandewie, Gabriele Mazzotta and Pali Rohár. - Fixes for a rather long-standing problem with the OOM killer and the freezer that frozen processes killed by the OOM do not actually release any memory until they are thawed, so OOM-killing them is rather pointless, with a couple of cleanups on top (Michal Hocko, Cong Wang, Rafael J Wysocki). - ACPICA update to upstream release 20140926, inlcuding mostly cleanups reducing differences between the upstream ACPICA and the kernel code, tools changes (acpidump, acpiexec) and support for the _DDN object (Bob Moore, Lv Zheng). - New PM QoS class for memory bandwidth from Tomeu Vizoso. - Default 32-bit DMA mask for platform devices enumerated by ACPI (this change is mostly needed for some drivers development in progress targeted at 3.19) from Heikki Krogerus. - ACPI EC driver cleanups, mostly related to debugging, from Lv Zheng. - cpufreq-dt driver updates from Thomas Petazzoni. - powernv cpuidle driver update from Preeti U Murthy" * tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits) intel_pstate: Correct BYT VID values. intel_pstate: Fix BYT frequency reporting intel_pstate: Don't lose sysfs settings during cpu offline cpufreq: intel_pstate: Reflect current no_turbo state correctly cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy PCI / PM: handle failure to enable wakeup on PCIe PME ACPI: invoke acpi_device_wakeup() with correct parameters PM / freezer: Clean up code after recent fixes PM: convert do_each_thread to for_each_process_thread OOM, PM: OOM killed task shouldn't escape PM suspend freezer: remove obsolete comments in __thaw_task() freezer: Do not freeze tasks killed by OOM killer ACPI / platform: provide default DMA mask cpuidle: powernv: Populate cpuidle state details by querying the device-tree cpufreq: cpufreq-dt: adjust message related to regulators cpufreq: cpufreq-dt: extend with platform_data cpufreq: allow driver-specific data ACPI / EC: Cleanup coding style. ACPI / EC: Refine event/query debugging messages. ... |
|
Miklos Szeredi | 46fdb794e3 |
shmem: support RENAME_WHITEOUT
Allocate a dentry, initialize it with a whiteout and hash it in the place of the old dentry. Later the old dentry will be moved away and the whiteout will remain. i_mutex protects agains concurrent readdir. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> |
|
Michal Hocko | 5695be142e |
OOM, PM: OOM killed task shouldn't escape PM suspend
PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.
Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.
Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.
Changes since v1
- push the re-check loop out of freeze_processes into
check_frozen_processes and invert the condition to make the code more
readable as per Rafael
Fixes:
|
|
Linus Torvalds | c2661b8060 |
A large number of cleanups and bug fixes, with some (minor) journal
optimizations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUPlLCAAoJENNvdpvBGATwpN8P/jnbDL1RqM9ZEAWfbDhvYumR Fi59b3IDzSJHuuJeP0nTblVbbWclpO9ljCd18ttsHr8gBXA0ViaEU0XvWbpHIwPN 1fr1/Ovd0wvBdIVdLlaLXTR9skH4lbkiXxv/tkfjVCOSpzqiKID98Z72e/gUjB7Z 8xjAn/mTCnXKnhqMGzi8RC2MP1wgY//ErR21bj6so/8RC8zu4P6JuVj/hI6s0y5i IPtAmjhdM7nxnS0wJwj7dLT0yNDftDh69qE6CgIwyK+Xn/SZFgYwE6+l02dj3DET ZcAzTT9ToTMJdWtMu+5Y4LY8ObJ5xqMPbMoUclQ3DWe6nZicvtcBVCjfG/J8pFlY IFD0nfh/OpX9cQMwJ+5Y8P4TrMiqM+FfuLfu+X83gLyrAyIazwoaZls2lxlEyC0w M25oAqeKGUeVakVlmDZlVyBf05cu5m62x1rRvpcwMXMNhJl8/xwsSdhdYGeJfbO0 0MfL1n6GmvHvouMXKNsXlat/w3QVaQWVRzqdF9x7Q730fSHC/zxVGO+Po3jz2fBd fBdfE14BIIU7nkyBVy0CZG5SDmQW4YACocOv/ATmII9j76F9eZQ3zsA8J1x+dLmJ dP1Uxvsn1C3HW8Ua239j0XUJncglb06iEId0ywdkmWcc1rbzsyZ/NzXN/QBdZmqB 9g4GKAXAyh15PeBTJ5K/ =vWic -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A large number of cleanups and bug fixes, with some (minor) journal optimizations" [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ] * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits) ext4: check s_chksum_driver when looking for bg csum presence ext4: move error report out of atomic context in ext4_init_block_bitmap() ext4: Replace open coded mdata csum feature to helper function ext4: delete useless comments about ext4_move_extents ext4: fix reservation overflow in ext4_da_write_begin ext4: add ext4_iget_normal() which is to be used for dir tree lookups ext4: don't orphan or truncate the boot loader inode ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT ext4: optimize block allocation on grow indepth ext4: get rid of code duplication ext4: fix over-defensive complaint after journal abort ext4: fix return value of ext4_do_update_inode ext4: fix mmap data corruption when blocksize < pagesize vfs: fix data corruption when blocksize < pagesize for mmaped data ext4: fold ext4_nojournal_sops into ext4_sops ext4: support freezing ext2 (nojournal) file systems ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs() ext4: don't check quota format when there are no quota files jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list jbd2: avoid pointless scanning of checkpoint lists ... |
|
Linus Torvalds | d3dc366bba |
Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe: "This is the core block IO pull request for 3.18. Apart from the new and improved flush machinery for blk-mq, this is all mostly bug fixes and cleanups. - blk-mq timeout updates and fixes from Christoph. - Removal of REQ_END, also from Christoph. We pass it through the ->queue_rq() hook for blk-mq instead, freeing up one of the request bits. The space was overly tight on 32-bit, so Martin also killed REQ_KERNEL since it's no longer used. - blk integrity updates and fixes from Martin and Gu Zheng. - Update to the flush machinery for blk-mq from Ming Lei. Now we have a per hardware context flush request, which both cleans up the code should scale better for flush intensive workloads on blk-mq. - Improve the error printing, from Rob Elliott. - Backing device improvements and cleanups from Tejun. - Fixup of a misplaced rq_complete() tracepoint from Hannes. - Make blk_get_request() return error pointers, fixing up issues where we NULL deref when a device goes bad or missing. From Joe Lawrence. - Prep work for drastically reducing the memory consumption of dm devices from Junichi Nomura. This allows creating clone bio sets without preallocating a lot of memory. - Fix a blk-mq hang on certain combinations of queue depths and hardware queues from me. - Limit memory consumption for blk-mq devices for crash dump scenarios and drivers that use crazy high depths (certain SCSI shared tag setups). We now just use a single queue and limited depth for that" * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits) block: Remove REQ_KERNEL blk-mq: allocate cpumask on the home node bio-integrity: remove the needless fail handle of bip_slab creating block: include func name in __get_request prints block: make blk_update_request print prefix match ratelimited prefix blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio block: fix alignment_offset math that assumes io_min is a power-of-2 blk-mq: Make bt_clear_tag() easier to read blk-mq: fix potential hang if rolling wakeup depth is too high block: add bioset_create_nobvec() block: use bio_clone_fast() in blk_rq_prep_clone() block: misplaced rq_complete tracepoint sd: Honor block layer integrity handling flags block: Replace strnicmp with strncasecmp block: Add T10 Protection Information functions block: Don't merge requests if integrity flags differ block: Integrity checksum flag block: Relocate bio integrity flags block: Add a disk flag to block integrity profile block: Add prefix to block integrity profile flags ... |
|
Linus Torvalds | dfe2c6dcc8 |
Merge branch 'akpm' (patches from Andrew Morton)
Merge second patch-bomb from Andrew Morton: - a few hotfixes - drivers/dma updates - MAINTAINERS updates - Quite a lot of lib/ updates - checkpatch updates - binfmt updates - autofs4 - drivers/rtc/ - various small tweaks to less used filesystems - ipc/ updates - kernel/watchdog.c changes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits) mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h> ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[] frv: remove unused declarations of __start___ex_table and __stop___ex_table kvm: ensure hard lockup detection is disabled by default kernel/watchdog.c: control hard lockup detection default staging: rtl8192u: use %*pEn to escape buffer staging: rtl8192e: use %*pEn to escape buffer staging: wlan-ng: use %*pEhp to print SN lib80211: remove unused print_ssid() wireless: hostap: proc: print properly escaped SSID wireless: ipw2x00: print SSID via %*pE wireless: libertas: print esaped string via %*pE lib/vsprintf: add %*pE[achnops] format specifier lib / string_helpers: introduce string_escape_mem() lib / string_helpers: refactoring the test suite lib / string_helpers: move documentation to c-file include/linux: remove strict_strto* definitions arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable fs: check bh blocknr earlier when searching lru ... |
|
Linus Torvalds | df133e8fa8 |
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar: "This tree includes the following changes: - fix memory hotplug - fix hibernation bootup memory layout assumptions - fix hyperv numa guest kernel messages - remove dead code - update documentation" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Update memory map description to list hypervisor-reserved area x86/mm, hibernate: Do not assume the first e820 area to be RAM x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() x86/mm/hotplug: Modify PGD entry when removing memory x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable() x86: Remove set_pmd_pfn |
|
Peter Feiner | 64e455079e |
mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
For VMAs that don't want write notifications, PTEs created for read faults
have their write bit set. If the read fault happens after VM_SOFTDIRTY is
cleared, then the PTE's softdirty bit will remain clear after subsequent
writes.
Here's a simple code snippet to demonstrate the bug:
char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
assert(*m == '\0'); /* new PTE allows write access */
assert(!soft_dirty(x));
*m = 'x'; /* should dirty the page */
assert(soft_dirty(x)); /* fails */
With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid unnecessary faults, write notifications
are disabled when VM_SOFTDIRTY is set.
As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set by
drivers were zapped on mprotect. An analogous bug was fixed in mmap by
commit
|
|
Marek Szyprowski | de9e14eebf |
drivers: dma-contiguous: add initialization from device tree
Add a function to create CMA region from previously reserved memory and add support for handling 'shared-dma-pool' reserved-memory device tree nodes. Based on previous code provided by Josh Cartwright <joshc@codeaurora.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Josh Cartwright <joshc@codeaurora.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Weijie Yang | 68faed630f |
mm/cma: fix cma bitmap aligned mask computing
The current cma bitmap aligned mask computation is incorrect. It could cause an unexpected alignment when using cma_alloc() if the wanted align order is larger than cma->order_per_bit. Take kvm for example (PAGE_SHIFT = 12), kvm_cma->order_per_bit is set to 6. When kvm_alloc_rma() tries to alloc kvm_rma_pages, it will use 15 as the expected align value. After using the current implementation however, we get 0 as cma bitmap aligned mask other than 511. This patch fixes the cma bitmap aligned mask calculation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.17] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |