linux

Commit Graph

Author	SHA1	Message	Date
Andy Lutomirski	d30dd8be06	mm: track NR_KERNEL_STACK in KiB instead of number of stacks Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone. This only makes sense if each kernel stack exists entirely in one zone, and allowing vmapped stacks could break this assumption. Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all architectures. Keep it simple and use KiB. Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Mel Gorman	11fb998986	mm: move most file-based accounting to the node There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Mel Gorman	4b9d0fab71	mm: rename NR_ANON_PAGES to NR_ANON_MAPPED NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_PAGES is the number of mapped anon pages. This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and NR_ANON_PAGES for mapped pages. This patch renames NR_ANON_PAGES so we have NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_MAPPED is the number of mapped anon pages. Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Mel Gorman	50658e2e04	mm: move page mapped accounting to the node Reclaim makes decisions based on the number of pages that are mapped but it's mixing node and zone information. Account NR_FILE_MAPPED and NR_ANON_PAGES pages on the node. Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Mel Gorman	599d0c954f	mm, vmscan: move LRU lists to node This moves the LRU lists from the zone to the node and related data such as counters, tracing, congestion tracking and writeback tracking. Unfortunately, due to reclaim and compaction retry logic, it is necessary to account for the number of LRU pages on both zone and node logic. Most reclaim logic is based on the node counters but the retry logic uses the zone counters which do not distinguish inactive and active sizes. It would be possible to leave the LRU counters on a per-zone basis but it's a heavier calculation across multiple cache lines that is much more frequent than the retry checks. Other than the LRU counters, this is mostly a mechanical patch but note that it introduces a number of anomalies. For example, the scans are per-zone but using per-node counters. We also mark a node as congested when a zone is congested. This causes weird problems that are fixed later but is easier to review. In the event that there is excessive overhead on 32-bit systems due to the nodes being on LRU then there are two potential solutions 1. Long-term isolation of highmem pages when reclaim is lowmem When pages are skipped, they are immediately added back onto the LRU list. If lowmem reclaim persisted for long periods of time, the same highmem pages get continually scanned. The idea would be that lowmem keeps those pages on a separate list until a reclaim for highmem pages arrives that splices the highmem pages back onto the LRU. It potentially could be implemented similar to the UNEVICTABLE list. That would reduce the skip rate with the potential corner case is that highmem pages have to be scanned and reclaimed to free lowmem slab pages. 2. Linear scan lowmem pages if the initial LRU shrink fails This will break LRU ordering but may be preferable and faster during memory pressure than skipping LRU pages. Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Mel Gorman	75ef718405	mm, vmstat: add infrastructure for per-node vmstats Patchset: "Move LRU page reclaim from zones to nodes v9" This series moves LRUs from the zones to the node. While this is a current rebase, the test results were based on mmotm as of June 23rd. Conceptually, this series is simple but there are a lot of details. Some of the broad motivations for this are; 1. The residency of a page partially depends on what zone the page was allocated from. This is partially combatted by the fair zone allocation policy but that is a partial solution that introduces overhead in the page allocator paths. 2. Currently, reclaim on node 0 behaves slightly different to node 1. For example, direct reclaim scans in zonelist order and reclaims even if the zone is over the high watermark regardless of the age of pages in that LRU. Kswapd on the other hand starts reclaim on the highest unbalanced zone. A difference in distribution of file/anon pages due to when they were allocated results can result in a difference in again. While the fair zone allocation policy mitigates some of the problems here, the page reclaim results on a multi-zone node will always be different to a single-zone node. it was scheduled on as a result. 3. kswapd and the page allocator scan zones in the opposite order to avoid interfering with each other but it's sensitive to timing. This mitigates the page allocator using pages that were allocated very recently in the ideal case but it's sensitive to timing. When kswapd is allocating from lower zones then it's great but during the rebalancing of the highest zone, the page allocator and kswapd interfere with each other. It's worse if the highest zone is small and difficult to balance. 4. slab shrinkers are node-based which makes it harder to identify the exact relationship between slab reclaim and LRU reclaim. The reason we have zone-based reclaim is that we used to have large highmem zones in common configurations and it was necessary to quickly find ZONE_NORMAL pages for reclaim. Today, this is much less of a concern as machines with lots of memory will (or should) use 64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are rare. Machines that do use highmem should have relatively low highmem:lowmem ratios than we worried about in the past. Conceptually, moving to node LRUs should be easier to understand. The page allocator plays fewer tricks to game reclaim and reclaim behaves similarly on all nodes. The series has been tested on a 16 core UMA machine and a 2-socket 48 core NUMA machine. The UMA results are presented in most cases as the NUMA machine behaved similarly. pagealloc --------- This is a microbenchmark that shows the benefit of removing the fair zone allocation policy. It was tested uip to order-4 but only orders 0 and 1 are shown as the other orders were comparable. 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 Min total-odr0-1 490.00 ( 0.00%) 457.00 ( 6.73%) Min total-odr0-2 347.00 ( 0.00%) 329.00 ( 5.19%) Min total-odr0-4 288.00 ( 0.00%) 273.00 ( 5.21%) Min total-odr0-8 251.00 ( 0.00%) 239.00 ( 4.78%) Min total-odr0-16 234.00 ( 0.00%) 222.00 ( 5.13%) Min total-odr0-32 223.00 ( 0.00%) 211.00 ( 5.38%) Min total-odr0-64 217.00 ( 0.00%) 208.00 ( 4.15%) Min total-odr0-128 214.00 ( 0.00%) 204.00 ( 4.67%) Min total-odr0-256 250.00 ( 0.00%) 230.00 ( 8.00%) Min total-odr0-512 271.00 ( 0.00%) 269.00 ( 0.74%) Min total-odr0-1024 291.00 ( 0.00%) 282.00 ( 3.09%) Min total-odr0-2048 303.00 ( 0.00%) 296.00 ( 2.31%) Min total-odr0-4096 311.00 ( 0.00%) 309.00 ( 0.64%) Min total-odr0-8192 316.00 ( 0.00%) 314.00 ( 0.63%) Min total-odr0-16384 317.00 ( 0.00%) 315.00 ( 0.63%) Min total-odr1-1 742.00 ( 0.00%) 712.00 ( 4.04%) Min total-odr1-2 562.00 ( 0.00%) 530.00 ( 5.69%) Min total-odr1-4 457.00 ( 0.00%) 433.00 ( 5.25%) Min total-odr1-8 411.00 ( 0.00%) 381.00 ( 7.30%) Min total-odr1-16 381.00 ( 0.00%) 356.00 ( 6.56%) Min total-odr1-32 372.00 ( 0.00%) 346.00 ( 6.99%) Min total-odr1-64 372.00 ( 0.00%) 343.00 ( 7.80%) Min total-odr1-128 375.00 ( 0.00%) 351.00 ( 6.40%) Min total-odr1-256 379.00 ( 0.00%) 351.00 ( 7.39%) Min total-odr1-512 385.00 ( 0.00%) 355.00 ( 7.79%) Min total-odr1-1024 386.00 ( 0.00%) 358.00 ( 7.25%) Min total-odr1-2048 390.00 ( 0.00%) 362.00 ( 7.18%) Min total-odr1-4096 390.00 ( 0.00%) 362.00 ( 7.18%) Min total-odr1-8192 388.00 ( 0.00%) 363.00 ( 6.44%) This shows a steady improvement throughout. The primary benefit is from reduced system CPU usage which is obvious from the overall times; 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 User 189.19 191.80 System 2604.45 2533.56 Elapsed 2855.30 2786.39 The vmstats also showed that the fair zone allocation policy was definitely removed as can be seen here; 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v8 DMA32 allocs 28794729769 0 Normal allocs 48432501431 77227309877 Movable allocs 0 0 tiobench on ext4 ---------------- tiobench is a benchmark that artifically benefits if old pages remain resident while new pages get reclaimed. The fair zone allocation policy mitigates this problem so pages age fairly. While the benchmark has problems, it is important that tiobench performance remains constant as it implies that page aging problems that the fair zone allocation policy fixes are not re-introduced. 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 Min PotentialReadSpeed 89.65 ( 0.00%) 90.21 ( 0.62%) Min SeqRead-MB/sec-1 82.68 ( 0.00%) 82.01 ( -0.81%) Min SeqRead-MB/sec-2 72.76 ( 0.00%) 72.07 ( -0.95%) Min SeqRead-MB/sec-4 75.13 ( 0.00%) 74.92 ( -0.28%) Min SeqRead-MB/sec-8 64.91 ( 0.00%) 65.19 ( 0.43%) Min SeqRead-MB/sec-16 62.24 ( 0.00%) 62.22 ( -0.03%) Min RandRead-MB/sec-1 0.88 ( 0.00%) 0.88 ( 0.00%) Min RandRead-MB/sec-2 0.95 ( 0.00%) 0.92 ( -3.16%) Min RandRead-MB/sec-4 1.43 ( 0.00%) 1.34 ( -6.29%) Min RandRead-MB/sec-8 1.61 ( 0.00%) 1.60 ( -0.62%) Min RandRead-MB/sec-16 1.80 ( 0.00%) 1.90 ( 5.56%) Min SeqWrite-MB/sec-1 76.41 ( 0.00%) 76.85 ( 0.58%) Min SeqWrite-MB/sec-2 74.11 ( 0.00%) 73.54 ( -0.77%) Min SeqWrite-MB/sec-4 80.05 ( 0.00%) 80.13 ( 0.10%) Min SeqWrite-MB/sec-8 72.88 ( 0.00%) 73.20 ( 0.44%) Min SeqWrite-MB/sec-16 75.91 ( 0.00%) 76.44 ( 0.70%) Min RandWrite-MB/sec-1 1.18 ( 0.00%) 1.14 ( -3.39%) Min RandWrite-MB/sec-2 1.02 ( 0.00%) 1.03 ( 0.98%) Min RandWrite-MB/sec-4 1.05 ( 0.00%) 0.98 ( -6.67%) Min RandWrite-MB/sec-8 0.89 ( 0.00%) 0.92 ( 3.37%) Min RandWrite-MB/sec-16 0.92 ( 0.00%) 0.93 ( 1.09%) 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 approx-v9 User 645.72 525.90 System 403.85 331.75 Elapsed 6795.36 6783.67 This shows that the series has little or not impact on tiobench which is desirable and a reduction in system CPU usage. It indicates that the fair zone allocation policy was removed in a manner that didn't reintroduce one class of page aging bug. There were only minor differences in overall reclaim activity 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 Minor Faults 645838 647465 Major Faults 573 640 Swap Ins 0 0 Swap Outs 0 0 DMA allocs 0 0 DMA32 allocs 46041453 44190646 Normal allocs 78053072 79887245 Movable allocs 0 0 Allocation stalls 24 67 Stall zone DMA 0 0 Stall zone DMA32 0 0 Stall zone Normal 0 2 Stall zone HighMem 0 0 Stall zone Movable 0 65 Direct pages scanned 10969 30609 Kswapd pages scanned 93375144 93492094 Kswapd pages reclaimed 93372243 93489370 Direct pages reclaimed 10969 30609 Kswapd efficiency 99% 99% Kswapd velocity 13741.015 13781.934 Direct efficiency 100% 100% Direct velocity 1.614 4.512 Percentage direct scans 0% 0% kswapd activity was roughly comparable. There were differences in direct reclaim activity but negligible in the context of the overall workload (velocity of 4 pages per second with the patches applied, 1.6 pages per second in the baseline kernel). pgbench read-only large configuration on ext4 --------------------------------------------- pgbench is a database benchmark that can be sensitive to page reclaim decisions. This also checks if removing the fair zone allocation policy is safe pgbench Transactions 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v8 Hmean 1 188.26 ( 0.00%) 189.78 ( 0.81%) Hmean 5 330.66 ( 0.00%) 328.69 ( -0.59%) Hmean 12 370.32 ( 0.00%) 380.72 ( 2.81%) Hmean 21 368.89 ( 0.00%) 369.00 ( 0.03%) Hmean 30 382.14 ( 0.00%) 360.89 ( -5.56%) Hmean 32 428.87 ( 0.00%) 432.96 ( 0.95%) Negligible differences again. As with tiobench, overall reclaim activity was comparable. bonnie++ on ext4 ---------------- No interesting performance difference, negligible differences on reclaim stats. paralleldd on ext4 ------------------ This workload uses varying numbers of dd instances to read large amounts of data from disk. 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v9 Amean Elapsd-1 186.04 ( 0.00%) 189.41 ( -1.82%) Amean Elapsd-3 192.27 ( 0.00%) 191.38 ( 0.46%) Amean Elapsd-5 185.21 ( 0.00%) 182.75 ( 1.33%) Amean Elapsd-7 183.71 ( 0.00%) 182.11 ( 0.87%) Amean Elapsd-12 180.96 ( 0.00%) 181.58 ( -0.35%) Amean Elapsd-16 181.36 ( 0.00%) 183.72 ( -1.30%) 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 User 1548.01 1552.44 System 8609.71 8515.08 Elapsed 3587.10 3594.54 There is little or no change in performance but some drop in system CPU usage. 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v9 Minor Faults 362662 367360 Major Faults 1204 1143 Swap Ins 22 0 Swap Outs 2855 1029 DMA allocs 0 0 DMA32 allocs 31409797 28837521 Normal allocs 46611853 49231282 Movable allocs 0 0 Direct pages scanned 0 0 Kswapd pages scanned 40845270 40869088 Kswapd pages reclaimed 40830976 40855294 Direct pages reclaimed 0 0 Kswapd efficiency 99% 99% Kswapd velocity 11386.711 11369.769 Direct efficiency 100% 100% Direct velocity 0.000 0.000 Percentage direct scans 0% 0% Page writes by reclaim 2855 1029 Page writes file 0 0 Page writes anon 2855 1029 Page reclaim immediate 771 1628 Sector Reads 293312636 293536360 Sector Writes 18213568 18186480 Page rescued immediate 0 0 Slabs scanned 128257 132747 Direct inode steals 181 56 Kswapd inode steals 59 1131 It basically shows that kswapd was active at roughly the same rate in both kernels. There was also comparable slab scanning activity and direct reclaim was avoided in both cases. There appears to be a large difference in numbers of inodes reclaimed but the workload has few active inodes and is likely a timing artifact. stutter ------- stutter simulates a simple workload. One part uses a lot of anonymous memory, a second measures mmap latency and a third copies a large file. The primary metric is checking for mmap latency. stutter 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v8 Min mmap 16.6283 ( 0.00%) 13.4258 ( 19.26%) 1st-qrtle mmap 54.7570 ( 0.00%) 34.9121 ( 36.24%) 2nd-qrtle mmap 57.3163 ( 0.00%) 46.1147 ( 19.54%) 3rd-qrtle mmap 58.9976 ( 0.00%) 47.1882 ( 20.02%) Max-90% mmap 59.7433 ( 0.00%) 47.4453 ( 20.58%) Max-93% mmap 60.1298 ( 0.00%) 47.6037 ( 20.83%) Max-95% mmap 73.4112 ( 0.00%) 82.8719 (-12.89%) Max-99% mmap 92.8542 ( 0.00%) 88.8870 ( 4.27%) Max mmap 1440.6569 ( 0.00%) 121.4201 ( 91.57%) Mean mmap 59.3493 ( 0.00%) 42.2991 ( 28.73%) Best99%Mean mmap 57.2121 ( 0.00%) 41.8207 ( 26.90%) Best95%Mean mmap 55.9113 ( 0.00%) 39.9620 ( 28.53%) Best90%Mean mmap 55.6199 ( 0.00%) 39.3124 ( 29.32%) Best50%Mean mmap 53.2183 ( 0.00%) 33.1307 ( 37.75%) Best10%Mean mmap 45.9842 ( 0.00%) 20.4040 ( 55.63%) Best5%Mean mmap 43.2256 ( 0.00%) 17.9654 ( 58.44%) Best1%Mean mmap 32.9388 ( 0.00%) 16.6875 ( 49.34%) This shows a number of improvements with the worst-case outlier greatly improved. Some of the vmstats are interesting 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 Swap Ins 163 502 Swap Outs 0 0 DMA allocs 0 0 DMA32 allocs 618719206 1381662383 Normal allocs 891235743 564138421 Movable allocs 0 0 Allocation stalls 2603 1 Direct pages scanned 216787 2 Kswapd pages scanned 50719775 41778378 Kswapd pages reclaimed 41541765 41777639 Direct pages reclaimed 209159 0 Kswapd efficiency 81% 99% Kswapd velocity 16859.554 14329.059 Direct efficiency 96% 0% Direct velocity 72.061 0.001 Percentage direct scans 0% 0% Page writes by reclaim 6215049 0 Page writes file 6215049 0 Page writes anon 0 0 Page reclaim immediate 70673 90 Sector Reads 81940800 81680456 Sector Writes 100158984 98816036 Page rescued immediate 0 0 Slabs scanned 1366954 22683 While this is not guaranteed in all cases, this particular test showed a large reduction in direct reclaim activity. It's also worth noting that no page writes were issued from reclaim context. This series is not without its hazards. There are at least three areas that I'm concerned with even though I could not reproduce any problems in that area. 1. Reclaim/compaction is going to be affected because the amount of reclaim is no longer targetted at a specific zone. Compaction works on a per-zone basis so there is no guarantee that reclaiming a few THP's worth page pages will have a positive impact on compaction success rates. 2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers are called is now different. This may or may not be a problem but if it is, it'll be because shrinkers are not called enough and some balancing is required. 3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are distributed between zones and the fair zone allocation policy used to do something very similar for anon. The distribution is now different but not necessarily in any way that matters but it's still worth bearing in mind. VM statistic counters for reclaim decisions are zone-based. If the kernel is to reclaim on a per-node basis then we need to track per-node statistics but there is no infrastructure for that. The most notable change is that the old node_page_state is renamed to sum_zone_node_page_state. The new node_page_state takes a pglist_data and uses per-node stats but none exist yet. There is some renaming such as vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming of mod_state to mod_zone_state. Otherwise, this is mostly a mechanical patch with no functional change. There is a lot of similarity between the node and zone helpers which is unfortunate but there was no obvious way of reusing the code and maintaining type safety. Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Linus Torvalds	0e06f5c0de	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...	2016-07-26 19:55:54 -07:00
Linus Torvalds	ae9799975c	regmap: Updates for v4.8 Several small updates and API enhancements: - Provide transparent unrolling of bulk writes into individual writes so they can be used with devices without raw formatting. - Fix compatibility between I2C controllers supporting block commands and devices with more than 8 bit wide registers. - Add some helpers for iopoll-like functionality and workarounds for weird interrupt controllers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXljTLAAoJECTWi3JdVIfQMSMH/1MRirJkwIds0SRbWcnG9hOe nMfUhkuFuCZJN82GwOh7YbAxLB1pGXKsGZq1XGMYFvwJawEizR3o83fgcRy1LQmG KOQHO1O/CWQqRo1ntGl33H2thhq6e67y6lpeMBhdMvp3VjqdyzlILNSva6iWbW/R N0iFSSKd4VdHlg/o+R3k7EzWCLZ4FZI7B9/5OjxWkoUR+tloNI8rR8Ecwl6+Q37K 8+yo6R0Ntx/JXB7JvA5Y8vqPdUGi68+XpcnaAcpM5DaL95+lqzduWCjKfjG5oatZ 3o/ORRVi+WyD95/NLXwKFMvSDSGokVDrU87HNgKph+f4sNF5TIoC/E+JhPUQBSU= =7BlM -----END PGP SIGNATURE----- Merge tag 'regmap-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "Several small updates and API enhancements: - provide transparent unrolling of bulk writes into individual writes so they can be used with devices without raw formatting. - fix compatibility between I2C controllers supporting block commands and devices with more than 8 bit wide registers. - add some helpers for iopoll-like functionality and workarounds for weird interrupt controllers" * tag 'regmap-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: add iopoll-like polling macro regmap: Support bulk writes for devices without raw formatting regmap-i2c: Use i2c block command only if register value width is 8 bit regmap: irq: Add support to call client specific pre/post interrupt service regmap: Add file patterns for regmap device tree bindings	2016-07-26 19:24:33 -07:00
Linus Torvalds	6453dbdda3	Power management material for v4.8-rc1 - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT 5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2 SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA 8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v JU1Cmumfdy2jJluT8xsR =uVGz -----END PGP SIGNATURE----- Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Again, the majority of changes go into the cpufreq subsystem, but there are no big features this time. The cpufreq changes that stand out somewhat are the governor interface rework and improvements related to the handling of frequency tables. Apart from those, there are fixes and new device/CPU IDs in drivers, cleanups and an improvement of the new schedutil governor. Next, there are some changes in the hibernation core, including a fix for a nasty problem related to the MONITOR/MWAIT usage by CPU offline during resume from hibernation, a few core improvements related to memory management during resume, a couple of additional debug features and cleanups. Finally, we have some fixes and cleanups in the devfreq subsystem, generic power domains framework improvements related to system suspend/resume, support for some new chips in intel_idle and in the power capping RAPL driver, a new version of the AnalyzeSuspend utility and some assorted fixes and cleanups. Specifics: - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko)" * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" PM / hibernate: Introduce test_resume mode for hibernation cpufreq: export cpufreq_driver_resolve_freq() cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index() PCI / PM: check all fields in pci_set_platform_pm() cpufreq: acpi-cpufreq: use cached frequency mapping when possible cpufreq: schedutil: map raw required frequency to driver frequency cpufreq: add cpufreq_driver_resolve_freq() cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT intel_pstate: Update cpu_frequency tracepoint every time cpufreq: intel_pstate: clean remnant struct element PM / tools: scripts: AnalyzeSuspend v4.2 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation cpufreq: powernv: Replacing pstate_id with frequency table index intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() ...	2016-07-26 17:29:07 -07:00
Kirill A. Shutemov	65c453778a	mm, rmap: account shmem thp pages Let's add ShmemHugePages and ShmemPmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map shmem THP. NR_ANON_TRANSPARENT_HUGEPAGES is renamed to NR_ANON_THPS. Link: http://lkml.kernel.org/r/1466021202-61880-27-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -07:00
Reza Arbab	a371d9f1cc	memory-hotplug: use zone_can_shift() for sysfs valid_zones attribute Since zone_can_shift() is being used to validate the target zone during onlining, it should also be used to determine the content of valid_zones. Link: http://lkml.kernel.org/r/1462816419-4479-4-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewd-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Banman <abanman@sgi.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -07:00
Linus Torvalds	015cd867e5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There are a couple of new things for s390 with this merge request: - a new scheduling domain "drawer" is added to reflect the unusual topology found on z13 machines. Performance tests showed up to 8 percent gain with the additional domain. - the new crc-32 checksum crypto module uses the vector-galois-field multiply and sum SIMD instruction to speed up crc-32 and crc-32c. - proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in the generic vmlinux.lds linker script definitions. - kcov instrumentation support. A prerequisite for that is the inline assembly basic block cleanup, which is the reason for the net/iucv/iucv.c change. - support for 2GB pages is added to the hugetlbfs backend. Then there are two removals: - the oprofile hardware sampling support is dead code and is removed. The oprofile user space uses the perf interface nowadays. - the ETR clock synchronization is removed, this has been superseeded be the STP clock synchronization. And it always has been "interesting" code.. And the usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits) s390/pci: Delete an unnecessary check before the function call "pci_dev_put" s390/smp: clean up a condition s390/cio/chp : Remove deprecated create_singlethread_workqueue s390/chsc: improve channel path descriptor determination s390/chsc: sanitize fmt check for chp_desc determination s390/cio: make fmt1 channel path descriptor optional s390/chsc: fix ioctl CHSC_INFO_CU command s390/cio/device_ops: fix kernel doc s390/cio: allow to reset channel measurement block s390/console: Make preferred console handling more consistent s390/mm: fix gmap tlb flush issues s390/mm: add support for 2GB hugepages s390: have unique symbol for __switch_to address s390/cpuinfo: show maximum thread id s390/ptrace: clarify bits in the per_struct s390: stack address vs thread_info s390: remove pointless load within __switch_to s390: enable kcov support s390/cpumf: use basic block for ecctr inline assembly s390/hypfs: use basic block for diag inline assembly ...	2016-07-26 12:22:51 -07:00
Rafael J. Wysocki	fa70db3f19	Merge branches 'pm-core', 'pm-clk', 'pm-domains' and 'pm-pci' * pm-core: PM / runtime: Asynchronous "idle" in pm_runtime_allow() PM / runtime: print error when activating a child to unactive parent * pm-clk: PM / clk: Add support for adding a specific clock from device-tree PM / clk: export symbols for existing pm_clk_<...> API fcns * pm-domains: PM / Domains: Convert pm_genpd_init() to return an error code PM / Domains: Stop/start devices during system PM suspend/resume in genpd PM / Domains: Allow runtime PM during system PM phases PM / Runtime: Avoid resuming devices again in pm_runtime_force_resume() PM / Domains: Remove redundant pm_request_idle() call in genpd PM / Domains: Remove redundant wrapper functions for system PM PM / Domains: Allow genpd to power on during system PM phases * pm-pci: PCI / PM: check all fields in pci_set_platform_pm()	2016-07-25 13:45:27 +02:00
Mark Brown	efeb1a3ab9	Merge remote-tracking branches 'regmap/topic/bulk', 'regmap/topic/i2c', 'regmap/topic/iopoll', 'regmap/topic/irq' and 'regmap/topic/maintainers' into regmap-next	2016-07-15 13:44:47 +01:00
Rafael J. Wysocki	fe7450b05f	PM / runtime: Asynchronous "idle" in pm_runtime_allow() Arjan reports that it takes a relatively long time to enable runtime PM for multiple devices at system startup, because all writes to the "control" attribute in sysfs are handled synchronously and if the device is suspended as a result of the write, it will block until that operation is complete. That may be avoided by passing the RPM_ASYNC flag to rpm_idle() in pm_runtime_allow() which will make it execute the device's "idle" callback asynchronously, so writes to "control" changing it from "on" to "auto" will return without waiting. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com>	2016-07-02 01:50:39 +02:00
Chen-Yu Tsai	5bf75b4497	regmap: Support bulk writes for devices without raw formatting When doing a bulk writes from a device which lacks raw I/O support we fall back to doing register at a time reads but we still use the raw formatters in order to render the data into the word size used by the device (since bulk reads still operate on the device word size rather than unsigned ints). This means that devices without raw formatting such as those that provide reg_read() are not supported. Provide handling for them by copying the values read into native endian values of the appropriate size. This complements commit `d5b98eb124` ("regmap: Support bulk reads for devices without raw formatting"). Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Mark Brown <broonie@kernel.org>	2016-06-29 19:48:00 +01:00
Ulf Hansson	7eb231c337	PM / Domains: Convert pm_genpd_init() to return an error code The are already cases when pm_genpd_init() can fail. Currently we hide the failures instead of propagating an error code, which is a better method. Moreover, to prepare for future changes like moving away from using a fixed array-size of the struct genpd_power_state, to instead dynamically allocate data for it, the pm_genpd_init() API needs to be able to return an error code, as allocation can fail. Current users of the pm_genpd_init() is thus requested to start dealing with error codes. In the transition phase, users will have to live with only error messages being printed to log. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-29 02:15:19 +02:00
Jon Hunter	498b5fdd40	PM / clk: Add support for adding a specific clock from device-tree Some drivers using the PM clocks framework need to add specific clocks from device-tree using a name by calling the functions of_clk_get_by_name() and then pm_clk_add_clk(). Rather than having drivers call both functions, add a helper function of_pm_clk_add_clk() that will call these functions so drivers can call a single function to add a specific clock from device-tree. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-28 00:42:10 +02:00
Linus Walleij	71723f9546	PM / runtime: print error when activating a child to unactive parent The code currently silently bails out with -EBUSY if you try to activate a child to an inactive parent. This typically happens when you have a runtime suspended parent and runtime resume your child, but forgot to set .ignore_children on the parent to true with pm_suspend_ignore_children(dev). Silently ignoring this error is not good as it gives rise to other strange behaviour like double-resume of devices after silently bailing out of the .runtime_resume() callback. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-28 00:40:30 +02:00
Guenter Roeck	d4ef930638	regmap-i2c: Use i2c block command only if register value width is 8 bit Chips with 16-bit registers don't usually work well with I2C block commands. For example, neither the LM75 datasheet nor the TMP102 datasheet mentions block command support, and in fact it does not work for any of those chips. Also, it is not clear how the block command would handle 16-bit SMBus operations in the fist place, since the data format associated with those commands is either little endian or big endian, which requires some kind of conversion to or from host byte order. Only use i2c block commands if both register and value width is 8 bit. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org>	2016-06-22 13:50:51 +01:00
Linus Torvalds	607117a153	Driver core fixes for 4.7-rc4 Here are a small number of debugfs, ISA, and one driver core fix for 4.7-rc4. All of these resolve reported issues. The ISA ones have spent the least amount of time in linux-next, sorry about that, I didn't realize they were regressions that needed to get in now (thanks to Thorsten for the prodding!) but they do all pass the 0-day bot tests. The others have been in linux-next for a while now. Full details about them are in the shortlog below. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAldlbeQACgkQMUfUDdst+ymnFACfaWhEKA/84jwNNHiim92diJrY zYsAoLOmpBw68yL6qTSZbcWJF4Flb6Xk =N8M2 -----END PGP SIGNATURE----- Merge tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are a small number of debugfs, ISA, and one driver core fix for 4.7-rc4. All of these resolve reported issues. The ISA ones have spent the least amount of time in linux-next, sorry about that, I didn't realize they were regressions that needed to get in now (thanks to Thorsten for the prodding!) but they do all pass the 0-day bot tests. The others have been in linux-next for a while now. Full details about them are in the shortlog below" * tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: isa: Dummy isa_register_driver should return error code isa: Call isa_bus_init before dependent ISA bus drivers register watchdog: ebc-c384_wdt: Allow build for X86_64 iio: stx104: Allow build for X86_64 gpio: Allow PC/104 devices on X86_64 isa: Allow ISA-style drivers on modern systems base: make module_create_drivers_dir race-free debugfs: open_proxy_open(): avoid double fops release debugfs: full_proxy_open(): free proxy on ->open() failure kernel/kcov: unproxify debugfs file's fops	2016-06-18 06:04:01 -10:00
William Breathitt Gray	32a5a0c047	isa: Call isa_bus_init before dependent ISA bus drivers register The isa_bus_init function must be called before drivers which utilize the ISA bus driver are registered. A race condition for initilization exists if device_initcall is used (the isa_bus_init callback is placed in the same initcall level as dependent drivers which use module_init). This patch ensures that isa_bus_init is called first by utilizing postcore_initcall in favor of device_initcall. Fixes: `a5117ba7da` ("[PATCH] Driver model: add ISA bus") Cc: Rene Herman <rene.herman@keyaccess.nl> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-06-17 20:47:11 -07:00
William Breathitt Gray	3a4955111a	isa: Allow ISA-style drivers on modern systems Several modern devices, such as PC/104 cards, are expected to run on modern systems via an ISA bus interface. Since ISA is a legacy interface for most modern architectures, ISA support should remain disabled in general. Support for ISA-style drivers should be enabled on a per driver basis. To allow ISA-style drivers on modern systems, this patch introduces the ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now build conditionally on the ISA_BUS_API Kconfig option, which defaults to the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the ISA_BUS_API Kconfig option to be selected on architectures which do not enable ISA (e.g. X86_64). The ISA_BUS Kconfig option is currently only implemented for X86 architectures. Other architectures may have their own ISA_BUS Kconfig options added as required. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-06-17 20:21:12 -07:00
Rafael J. Wysocki	9d066a2527	Merge branches 'pm-opp' and 'pm-cpufreq-fixes' * pm-opp: PM / OPP: Add 'UNKNOWN' status for shared_opp in struct opp_table * pm-cpufreq-fixes: cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed	2016-06-18 01:55:13 +02:00
Viresh Kumar	79ee2e8f73	PM / OPP: Add 'UNKNOWN' status for shared_opp in struct opp_table dev_pm_opp_get_sharing_cpus() returns 0 even in the case when the OPP core doesn't know whether or not the table is shared. It works on the majority of platforms, where the OPP table is never created before invoking the function and then -ENODEV is returned by it. But in the case of one platform (Jetson TK1) at least, the situation is a bit different. The OPP table has been created (somehow) before dev_pm_opp_get_sharing_cpus() is called and it returns 0. Its caller treats that as 'the CPUs don't share OPPs' and that leads to degraded performance. Fix this by converting 'shared_opp' in struct opp_table to an enum and making dev_pm_opp_get_sharing_cpus() return -EINVAL in case when the value of that field is "access unknown", so that the caller can handle it accordingly (cpufreq-dt considers that as 'all CPUs share the table', for example). Fixes: `6f707daa38` "PM / OPP: Add dev_pm_opp_get_sharing_cpus()" Reported-and-tested-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw : Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:50:36 +02:00
Ulf Hansson	122a22377a	PM / Domains: Stop/start devices during system PM suspend/resume in genpd Not all subsystems/drivers that manages devices attached to a genpd makes use of the pm_runtime_force_suspend\|resume() helper functions to deal with system PM suspend/resume. In cases like these and when genpd's ->stop\|start() callbacks are used for the device, invoke the pm_runtime_force_suspend\|resume() helper functions from genpd's "noirq" system PM callbacks. In this way we make sure to "stop" the device on suspend and to "start" it on resume. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:14:37 +02:00
Ulf Hansson	4d23a5e848	PM / Domains: Allow runtime PM during system PM phases In cases when a PM domain isn't powered off when genpd's ->prepare() callback is invoked, genpd runtime resumes and disables runtime PM for the device. This behaviour was needed when genpd managed intermediate states during the power off sequence, as to maintain proper low power states of devices during system PM suspend/resume. Commit `ba2bbfbf63` (PM / Domains: Remove intermediate states from the power off sequence), enables genpd to improve its behaviour in that respect. The PM core disables runtime PM at __device_suspend_late() before it calls a system PM "late" callback for a device. When resuming a device, after a corresponding "early" callback has been invoked, the PM core re-enables runtime PM. By changing genpd to allow runtime PM according to the same system PM phases as the PM core, devices can be runtime resumed by their corresponding subsystem/driver when really needed. In this way, genpd no longer need to runtime resume the device from its ->prepare() callback. In most cases that avoids unnecessary and energy- wasting operations of runtime resuming devices that have nothing to do, only to runtime suspend them shortly after. Although, because of changing this behaviour in genpd and due to that genpd powers on the PM domain unconditionally in the system PM resume "noirq" phase, it could potentially cause a PM domain to stay powered on even if it's unused after the system has resumed. To avoid this, schedule a power off work when genpd's system PM ->complete() callback has been invoked for the last device in the PM domain. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:14:36 +02:00
Ulf Hansson	9f5b52747d	PM / Runtime: Avoid resuming devices again in pm_runtime_force_resume() If the runtime PM status of the device isn't RPM_SUSPENDED, prevent the pm_runtime_force_resume() from invoking the ->runtime_resume() callback for the device, as it's not the expected behaviour from the subsystem/driver. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:14:36 +02:00
Ulf Hansson	9b002b8f0e	PM / Domains: Remove redundant pm_request_idle() call in genpd The PM core increases the runtime PM usage count at the system PM prepare phase. Later when the system resumes, it does a pm_runtime_put() in the complete phase, which in addition to decrementing the usage count, does the equivalent of a pm_request_idle(). Therefore the call to pm_request_idle() from within genpd's ->complete() callback is redundant, so remove it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:14:34 +02:00
Ulf Hansson	8001885389	PM / Domains: Remove redundant wrapper functions for system PM Due to the previous changes in genpd, which removed the suspend_power_off flag, several of the system PM callbacks no longer do any additional checks but only invoke corresponding pm_generic_* helper functions. To clean up the code, drop these wrapper functions as they have become redundant. Instead, assign the system PM callbacks directly to the pm_generic_*() helper functions. While changing this, it has bocame clear that some of the current system PM callbacks in genpd invoke wrong driver callbacks. For example, the genpd's ->restore() callback invokes pm_generic_resume(), while that should be pm_generic_restore(). Fix that as well. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:01:43 +02:00
Ulf Hansson	39dd0f234f	PM / Domains: Allow genpd to power on during system PM phases If a PM domain is powered off when the first device starts its system PM prepare phase, genpd prevents any further attempts to power on the PM domain during the following system PM phases. Not until the system PM complete phase is finalized for all devices in the PM domain, genpd again allows it to be powered on. This behaviour needs to be changed, as a subsystem/driver for a device in the same PM domain may still need to be able to serve requests in some of the system PM phases. Accordingly, it may need to runtime resume its device and thus also request the corresponding PM domain to be powered on. To deal with these scenarios, let's make the device operational in the system PM prepare phase by runtime resuming it, no matter if the PM domain is powered on or off. Changing this also enables us to remove genpd's suspend_power_off flag, as it's being used to track this condition. Additionally, we must allow the PM domain to be powered on via runtime PM during the system PM phases. This change also requires a fix in the AMD ACP (Audio CoProcessor) drm driver. It registers a genpd to model the ACP as a PM domain, but unfortunately it's also abuses genpd's "internal" suspend_power_off flag to deal with a corner case at system PM resume. More precisely, the so called SMU block powers on the ACP at system PM resume, unconditionally if it's being used or not. This may lead to that genpd's internal status of the power state, may not correctly reflect the power state of the HW after a system PM resume. Because of changing the behaviour of genpd, by runtime resuming devices in the prepare phase, the AMD ACP drm driver no longer have to deal with this corner case. So let's just drop the related code in this driver. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-16 15:01:43 +02:00
Jiri Slaby	7e1b1fc4da	base: make module_create_drivers_dir race-free Modules which register drivers via standard path (driver_register) in parallel can cause a warning: WARNING: CPU: 2 PID: 3492 at ../fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/module/saa7146/drivers' Modules linked in: hexium_gemini(+) mxb(+) ... ... Call Trace: ... [<ffffffff812e63a2>] sysfs_warn_dup+0x62/0x80 [<ffffffff812e6487>] sysfs_create_dir_ns+0x77/0x90 [<ffffffff8140f2c4>] kobject_add_internal+0xb4/0x340 [<ffffffff8140f5b8>] kobject_add+0x68/0xb0 [<ffffffff8140f631>] kobject_create_and_add+0x31/0x70 [<ffffffff8157a703>] module_add_driver+0xc3/0xd0 [<ffffffff8155e5d4>] bus_add_driver+0x154/0x280 [<ffffffff815604c0>] driver_register+0x60/0xe0 [<ffffffff8145bed0>] __pci_register_driver+0x60/0x70 [<ffffffffa0273e14>] saa7146_register_extension+0x64/0x90 [saa7146] [<ffffffffa0033011>] hexium_init_module+0x11/0x1000 [hexium_gemini] ... As can be (mostly) seen, driver_register causes this call sequence: -> bus_add_driver -> module_add_driver -> module_create_drivers_dir The last one creates "drivers" directory in /sys/module/<...>. When this is done in parallel, the directory is attempted to be created twice at the same time. This can be easily reproduced by loading mxb and hexium_gemini in parallel: while :; do modprobe mxb & modprobe hexium_gemini wait rmmod mxb hexium_gemini saa7146_vv saa7146 done saa7146 calls pci_register_driver for both mxb and hexium_gemini, which means /sys/module/saa7146/drivers is to be created for both of them. Fix this by a new mutex in module_create_drivers_dir which makes the test-and-create "drivers" dir atomic. I inverted the condition and removed 'return' to avoid multiple unlocks or a goto. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Fixes: `fe480a2675` (Modules: only add drivers/ direcory if needed) Cc: v2.6.21+ <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-06-15 19:21:31 -07:00
Paul Gortmaker	29b968b2af	PM / clk: export symbols for existing pm_clk_<...> API fcns While trying to convert a DMA driver from bool to tristate, we encountered the following: ERROR: "pm_clk_add_clk" [drivers/dma/tegra210-adma.ko] undefined! ERROR: "pm_clk_create" [drivers/dma/tegra210-adma.ko] undefined! ERROR: "pm_clk_destroy" [drivers/dma/tegra210-adma.ko] undefined! ERROR: "pm_clk_suspend" [drivers/dma/tegra210-adma.ko] undefined! ERROR: "pm_clk_resume" [drivers/dma/tegra210-adma.ko] undefined! Since in principle there is nothing preventing these functions from being used in modular code as well as builtin, we add the export of them. We expand the scope to also include: pm_clk_add of_pm_clk_add_clks pm_clk_remove pm_clk_remove_clk pm_clk_init pm_clk_runtime_suspend pm_clk_runtime_resume pm_clk_add_notifier ...since these functions are also non-static and presumably form part of the existing API used by other drivers that may become modular in the future. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-15 01:42:15 +02:00
Heiko Carstens	a62247e1f5	topology/sysfs: provide drawer id and siblings attributes The s390 cpu topology gained another hierarchy level. The top level is now called drawer and contains several books. A book used to be the top level. In order to expose the cpu topology to user space allow to create new sysfs attributes dependent on CONFIG_SCHED_DRAWER which an architecture may define and select. These additional attributes will be available: /sys/devices/system/cpu/cpuX/topology/drawer_id /sys/devices/system/cpu/cpuX/topology/drawer_siblings /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2016-06-13 15:58:27 +02:00
Laxman Dewangan	ccc1256192	regmap: irq: Add support to call client specific pre/post interrupt service Regmap irq implements the generic interrupt service routine which is common for most of devices. Some devices, like MAX77620, MAX20024 needs the special handling before and after servicing the interrupt as generic. For the example, MAX77620 programming guidelines for interrupt servicing says: 1. When interrupt occurs from PMIC, mask the PMIC interrupt by setting GLBLM. 2. Read IRQTOP and service the interrupt accordingly. 3. Once all interrupts has been checked and serviced, the interrupt service routine un-masks the hardware interrupt line by clearing GLBLM. The step (2) is implemented in regmap irq as generic routine. For step (1) and (3), add callbacks from regmap irq to client driver to handle chip specific configurations. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2016-06-03 00:41:15 +01:00
Linus Torvalds	877c057d2b	More power management updates for v4.7-rc1 - Stable-candidate cpuidle fix to make it check the right variable when deciding whether or not to enable interrupts on the local CPU so as to avoid enabling iterrupts too early in some cases if the system has both coupled and per-core idle states (Daniel Lezcano). - Stable-candidate PM core fix to make it handle failures at the "late suspend" stage of device suspend consistently for all devices regardless of whether or not async suspend/resume is enabled for them (Rafael Wysocki). - Cleanups in the cpufreq core, the schedutil governor and the intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXRglcAAoJEILEb/54YlRxCwEQAKQNLO5wrNnOQmVrMHaOk1XH 4zT+AdggLRVgBld4d4Vcli2zJPZfpnZuzjJdwTB5zLgQ4WIcBb6meOfH87XGqRyJ o4ksyEUhpvDk8AdlmTA5CvvjFuydPJG5ZUSiM035XRT9heebvhgyaMBnT3ucXbq9 7LhNhCQ+a8arndt9ePO7tZnFfQQUbwNJ2BDVuH5DJBqMIFOo2/Kpag43CdFWlWZT jnWaleDCjSmanuJ/45bFJHJeSZ7PK2etnArfzKtb9QLSGnuEfFPdHuUzJYo5dkP7 UBeYA94hhfR3f5FJIqNlF3N+eLEX1idpwxc8+CJLLDKDd1ZCBrbLoz5fwM+fVn0h AfmyR+J1czcbiphsmpViOYDRrKdiQVkbP6SpBswvgMCZAcNDF2bxhzOlcuTUc+u0 8xsjWOtArL6uvzsAHa1HY6hhgUn9FB8m20HX+DmS2/zzqyzoRefenoyVcuLsAhXC fm+sARQ7tvy3OoGRQ9mloWgv2X5iQUY5IVjOG2amIhbUvVmKQutPjVTGTwHqmmcb 2nNYptLsTA6crvnexPcPHY+OFjkQl/omtfaMx+OJl63yhln5ibveGOfZ6F8sPdoB bRqHuHoK/xh9hSNwj117ZFzq1nm54mLjh0Yhw3EXFcV4I9vdsTp8/WeNThGvT17j M+6PDXyjlwh3HZpGm+HW =3vtL -----END PGP SIGNATURE----- Merge tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are two stable-candidate fixes (PM core, cpuidle) and a bunch of cpufreq cleanups. Specifics: - Stable-candidate cpuidle fix to make it check the right variable when deciding whether or not to enable interrupts on the local CPU so as to avoid enabling iterrupts too early in some cases if the system has both coupled and per-core idle states (Daniel Lezcano). - Stable-candidate PM core fix to make it handle failures at the "late suspend" stage of device suspend consistently for all devices regardless of whether or not async suspend/resume is enabled for them (Rafael Wysocki). - Cleanups in the cpufreq core, the schedutil governor and the intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar)" * tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / sleep: Handle failures in device_suspend_late() consistently cpufreq: schedutil: Improve prints messages with pr_fmt cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() cpufreq: simplified goto out in cpufreq_register_driver() cpufreq: governor: CPUFREQ_GOV_STOP never fails cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails intel_pstate: Simplify conditional in intel_pstate_set_policy()	2016-05-25 15:29:21 -07:00
Rafael J. Wysocki	4c2628cd75	Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-core' * pm-cpufreq: cpufreq: schedutil: Improve prints messages with pr_fmt cpufreq: simplified goto out in cpufreq_register_driver() cpufreq: governor: CPUFREQ_GOV_STOP never fails cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails intel_pstate: Simplify conditional in intel_pstate_set_policy() * pm-cpuidle: cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() * pm-core: PM / sleep: Handle failures in device_suspend_late() consistently	2016-05-25 21:54:45 +02:00
Linus Torvalds	3aa2fc1667	driver core update for 4.7-rc1 Here's the "big" driver core update for 4.7-rc1. Mostly just debugfs changes, the long-known and messy races with removing debugfs files should be fixed thanks to the great work of Nicolai Stange. We also have some isa updates in here (the x86 maintainers told me to take it through this tree), a new warning when we run out of dynamic char major numbers, and a few other assorted changes, details in the shortlog. All have been in linux-next for some time with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlc/0mwACgkQMUfUDdst+ynjXACgjNxR5nMUiM8ZuuD0i4Xj7VXd hnIAoM08+XDCv41noGdAcKv+2WZVZWMC =i+0H -----END PGP SIGNATURE----- Merge tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the "big" driver core update for 4.7-rc1. Mostly just debugfs changes, the long-known and messy races with removing debugfs files should be fixed thanks to the great work of Nicolai Stange. We also have some isa updates in here (the x86 maintainers told me to take it through this tree), a new warning when we run out of dynamic char major numbers, and a few other assorted changes, details in the shortlog. All have been in linux-next for some time with no reported issues" * tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case" gpio: ws16c48: Utilize the ISA bus driver gpio: 104-idio-16: Utilize the ISA bus driver gpio: 104-idi-48: Utilize the ISA bus driver gpio: 104-dio-48e: Utilize the ISA bus driver watchdog: ebc-c384_wdt: Utilize the ISA bus driver iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros iio: stx104: Add X86 dependency to STX104 Kconfig option Documentation: Add ISA bus driver documentation isa: Implement the max_num_isa_dev macro isa: Implement the module_isa_driver macro pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS isa: Decouple X86_32 dependency from the ISA Kconfig option driver-core: use 'dev' argument in dev_dbg_ratelimited stub base: dd: don't remove driver_data in -EPROBE_DEFER case kernfs: Move faulting copy_user operations outside of the mutex devcoredump: add scatterlist support debugfs: unproxify files created through debugfs_create_u32_array() debugfs: unproxify files created through debugfs_create_blob() debugfs: unproxify files created through debugfs_create_bool() ...	2016-05-20 21:26:15 -07:00
Rafael J. Wysocki	3a17fb329d	PM / sleep: Handle failures in device_suspend_late() consistently Grygorii Strashko reports: The PM runtime will be left disabled for the device if its .suspend_late() callback fails and async suspend is not allowed for this device. In this case device will not be added in dpm_late_early_list and dpm_resume_early() will ignore this device, as result PM runtime will be disabled for it forever (side effect: after 8 subsequent failures for the same device the PM runtime will be reenabled due to disable_depth overflow). To fix this problem, add devices to dpm_late_early_list regardless of whether or not device_suspend_late() returns errors for them. That will ensure failures in there to be handled consistently for all devices regardless of their async suspend/resume status. Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>	2016-05-20 23:09:49 +02:00
Linus Torvalds	16490980e3	Generic device properties framework update for v4.7-rc1 Just one commit reworking the handling of built-in properties initialization and updating a few drivers in accordance with the core framework changes. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXOjO2AAoJEILEb/54YlRxiksP/jJjZZ5B7r7iBCLhtornA9L4 I54t0X4HcxmkBd2k5VxBE8k0bzMo/Tx52Bm/YnKxKQS7CSVPr7Lhrj3pH2NoQO5i S5T4Wdi8AYK/0LXcCwEfyMCAr7M9mstc+v2mQu1ZPtGKSI/c9+3r3zCrtSbxjwK8 RU4RSa3Y27drL8wbyc2PvjytYcbmwM4BsLOL8v4noUe2WussN4LEjiN3q4Y+s6O8 gmSPCZDY5Qg1Nc1YBse+lPsHHUng9cgIHzAnQkN5CnM36KLoNG3KaYGu+0ZxSnMe Ed+NaCZZC+u43r7vaJaOpnE+1jxYLOw4w7ail6GHveaI1EYKUtaRF8KcUDn1wwkU MxYslt5i8LS0ar2DPJ/RgHZ2ajaZAPjdie/+PJQwO6e5PqQsCDOLia8zWTS6HEMf DTirJrRfy2o1QR693I8MSwm5hsFiOPTeYXexWxLAe+LtUN277qaZrFAp7DSSEKm1 p53K9xaWmd2gJUTsmnim0TlTA2j9PO9dCJeXJDBa2B6QR11oQ160BvZ6/kk5U/FZ /XSbixwvRs2i6H/lVNgqyGmJdIAt75HcCUwNw4QI76nHrxx1BirA2DNbrGAlsNAF eBYaZ7pYtUN4/nZh+jYJWki7pY46F/7OgGoQQzM2LR6W8dz7Cwt/OQyt0+ekLhN4 prNP1Br++rFeCIc7kFOk =O8T9 -----END PGP SIGNATURE----- Merge tag 'device-properties-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties update from Rafael Wysocki: "Generic device properties framework update. Just one commit reworking the handling of built-in properties initialization and updating a few drivers in accordance with the core framework changes" * tag 'device-properties-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: device property: don't bother the drivers with struct property_set	2016-05-16 19:51:04 -07:00
Linus Torvalds	d57d394319	Power management material for v4.7-rc1 - New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki). - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez). - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada). - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi). - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches). - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat). - cpufreq qoriq driver fixes and cleanups (Jia Hongtao). - ACPI cpufreq driver cleanups (Viresh Kumar). - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla). - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann). - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla). - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham). - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King). - Intel Broxton support for the intel_idle driver (Len Brown). - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach). - ARM cpuidle cleanups (Jisheng Zhang). - Intel Kabylake support for the RAPL power capping driver (Jacob Pan). - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner). - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXOjLgAAoJEILEb/54YlRxfn0P/RbSPpNlUNBIE8DFrdD9jRdJ TIpZ7uiHi9tU1ZF17UBbb/SwuWfYVnVmiorZGRfFOtGaoqh0HFZ/nplDz99rK0ku vW2OnbojMQEUMU3IcUT1y4BsSl0H23f7ZOKrdprALeWxDQmbgnYjrE6vkX6hRtld A8eeZvIEJ5CzV8S+9aOOOpojW2yXk5dYGdZ7gpQdoM0n7zVLyPnNucJoha3BYmOG FwKEIe05RpIhfLfGT0CXIRcOzwAZ6ZWKgOrXUrx/AadPbvu/TP9zkI0djYI8ukyv z2oiO/GExoeGVuUzvy8vY5SiH4NQvViftFzMZepcsmjxmVglohMPRL8VLjZIBckk DDcqH9e0OQI20jjYT1vIf5+JWBvLxuQfGtyzI0S+sE/elB1zI/3O8p+8N2CuF5n+ my2dawIewnHI/0AdSpJ+K7DVrfwPHAX19axtPX3dJSLh2OuHCPNlAtbxRGAriBfH Zv9NETxlrch69o2AD4K54DErWV1FsYLznzK5Zms6MC2Ispbb+oiYpacTlZblznvb H5U2SSNlA5Niir3vVJ01nKRtzxlWoi67CQxbYrGhlaR0nTTxf9HqWgcSiTZrn7Pv hs+LA2aUfMf3JGjStdORS7S8biQSid5vypfkglpWLZBKHNC9BqqZd9gSM+jF3FVh ps4mMM4UXY4hnoFDkMBI =WM89 -----END PGP SIGNATURE----- Merge tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the cpufreq subsystem this time. To me, quite obviously, the biggest ticket item is the new "schedutil" governor. Interestingly enough, it's the first new cpufreq governor since the beginning of the git era (except for some out-of-the-tree ones). There are two main differences between it and the existing governors. First, it uses the information provided by the scheduler directly for making its decisions, so it doesn't have to track anything by itself. Second, it can invoke drivers (supporting that feature) to adjust CPU performance right away without having to spawn work items to be executed in process context or similar. Currently, the acpi-cpufreq driver is the only one supporting that mode of operation, but then it is used on a large number of systems. The "schedutil" governor as included here is very simple and mostly regarded as a foundation for future work on the integration of the scheduler with CPU power management (in fact, there is work in progress on top of it already). Nevertheless it works and the preliminary results obtained with it are encouraging. There also is some consolidation of CPU frequency management for ARM platforms that can add their machine IDs the the new stub dt-platdev driver now and that will take care of creating the requisite platform device for cpufreq-dt, so it is not necessary to do that in platform code any more. Several ARM platforms are switched over to using this generic mechanism. In addition to that, the intel_pstate driver is now going to respect CPU frequency limits set by the platform firmware (or a BMC) and provided via the ACPI _PPC object. The devfreq subsystem is getting a new "passive" governor for SoCs subsystems that will depend on somebody else to manage their voltage rails and its support for Samsung Exynos SoCs is consolidated. The rest is support for new hardware (Intel Broxton support in intel_idle for one example), bug fixes, optimizations and cleanups in a number of places. Specifics: - New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki) - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez) - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada) - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi) - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches) - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat) - cpufreq qoriq driver fixes and cleanups (Jia Hongtao) - ACPI cpufreq driver cleanups (Viresh Kumar) - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla) - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann) - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla) - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham) - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King) - Intel Broxton support for the intel_idle driver (Len Brown) - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach) - ARM cpuidle cleanups (Jisheng Zhang) - Intel Kabylake support for the RAPL power capping driver (Jacob Pan) - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner) - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger)" * tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() PM / OPP: Move CONFIG_OF dependent code in a separate file cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table cpufreq: tango: Use generic platdev driver PM / OPP: pass cpumask by reference cpufreq: Fix GOV_LIMITS handling for the userspace governor cpupower: fix potential memory leak PM / devfreq: style/typo fixes PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus ..	2016-05-16 19:17:22 -07:00
Rafael J. Wysocki	27c4a1c5ef	Merge branches 'pm-avs', 'pm-clk', 'powercap' and 'pm-tools' * pm-avs: PM / AVS: rockchip-io: make io-domains a child of the GRF * pm-clk: PM / clk: ensure we don't allocate a -ve size of count clks * powercap: powercap/intel_rapl: Add support for Kabylake * pm-tools: cpupower: fix potential memory leak cpupower: Add cpuidle parts into library cpupowerutils: bench: trivial fix of spelling mistake on "average" Fix cpupower manpages "NAME" section cpupower: bench: parse.c: fix several resource leaks Honour user's LDFLAGS	2016-05-16 14:31:56 +02:00
Rafael J. Wysocki	aa24781b1c	Merge branches 'pm-core' and 'pm-domains' * pm-core: PM / sleep: Drop unused `info' variable PM / Runtime: Move ignore_children flag under CONFIG_PM PM / Runtime: Fix error path in pm_runtime_force_resume() * pm-domains: PM / Domains: Drop unnecessary wakeup code from pm_genpd_prepare() PM / Domains: Remove redundant pm_runtime_get\|put*() in pm_genpd_prepare() PM / Domains: Remove ->save\|restore_state() callbacks PM / Domains: Rename pm_genpd_runtime_suspend\|resume() PM / Domains: Rename stop_ok to suspend_ok for the genpd governor	2016-05-16 14:31:29 +02:00
Rafael J. Wysocki	29cff1844a	Merge branch 'pm-opp' * pm-opp: PM / OPP: Move CONFIG_OF dependent code in a separate file PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table PM / OPP: pass cpumask by reference PM / OPP: Add dev_pm_opp_get_sharing_cpus() PM / OPP: Mark cpumask as const in dev_pm_opp_set_sharing_cpus() PM / OPP: -ENOSYS is applicable only to syscalls PM / OPP: Mark shared-opp for non-dt case PM / OPP: Relocate dev_pm_opp_set_sharing_cpus() PM / OPP: dev_pm_opp_set_sharing_cpus() doesn't depend on CONFIG_OF PM / OPP: Add missing doc style comments PM / OPP: Propagate the error returned by _find_opp_table()	2016-05-16 14:30:14 +02:00
Mark Brown	d4ab78d707	Merge remote-tracking branches 'regmap/topic/doc' and 'regmap/topic/flat' into regmap-next	2016-05-13 10:36:14 +01:00
Mark Brown	2a2cd52190	Merge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and 'regmap/fix/spmi' into regmap-linus	2016-05-13 10:36:10 +01:00
Mark Brown	066a0e0b49	Merge remote-tracking branch 'regmap/fix/mmio' into regmap-linus	2016-05-13 10:36:09 +01:00
Rafael J. Wysocki	dab2e29402	Merge back new device properties material for v4.7.	2016-05-06 22:07:33 +02:00
Rafael J. Wysocki	c8541203a6	Merge back new material for v4.7.	2016-05-06 22:05:16 +02:00
Viresh Kumar	f47b72a15a	PM / OPP: Move CONFIG_OF dependent code in a separate file Recently, a few issues were noticed in the code where CONFIG_OF wasn't consistently used for many routines. The core file is big enough now and ifdef hackery makes it less readable. Move OF-specific code to another file and compile that only if CONFIG_OF is enabled. Compile-tested: - For ARM (exynos) with CONFIG_OF enabled - For X86 with CONFIG_OF disabled (have to enable CONFIG_PM_OPP separately) No functional changes. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-06 13:22:49 +02:00

1 2 3 4 5 ...

3125 Commits