linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	2fd8774c79	Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixes from Konrad Rzeszutek Wilk: "This has one fix to make i915 work when using Xen SWIOTLB, and a feature from Geert to aid in debugging of devices that can't do DMA outside the 32-bit address space. The feature from Geert is on top of v4.10 merge window commit (specifically you pulling my previous branch), as his changes were dependent on the Documentation/ movement patches. I figured it would just easier than me trying than to cherry-pick the Documentation patches to satisfy git. The patches have been soaking since 12/20, albeit I updated the last patch due to linux-next catching an compiler error and adding an Tested-and-Reported-by tag" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Export swiotlb_max_segment to users swiotlb: Add swiotlb=noforce debug option swiotlb: Convert swiotlb_force from int to enum x86, swiotlb: Simplify pci_swiotlb_detect_override()	2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk	7453c549f5	swiotlb: Export swiotlb_max_segment to users So they can figure out what is the optimal number of pages that can be contingously stitched together without fear of bounce buffer. We also expose an mechanism for sub-users of SWIOTLB API, such as Xen-SWIOTLB to set the max segment value. And lastly if swiotlb=force is set (which mandates we bounce buffer everything) we set max_segment so at least we can bounce buffer one 4K page instead of a giant 512KB one for which we may not have space. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Juergen Gross <jgross@suse.com>	2017-01-06 13:00:01 -05:00
Chris Wilson	2471eb5fb6	drm/i915: Prevent timeline updates whilst performing reset As the fence may be signaled concurrently from an interrupt on another device, it is possible for the list of requests on the timeline to be modified as we walk it. Take both (the context's timeline and the global timeline) locks to prevent such modifications. Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk (cherry picked from commit `00c25e3f40`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:57 +02:00
Chris Wilson	64d1461ce0	drm/i915: Silence allocation failure during sg_trim() As trimming the sg table is merely an optimisation that gracefully fails if we cannot allocate a new table, we do not need to report the failure either. Fixes: `0c40ce130e` ("drm/i915: Trim the object sg table") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk (cherry picked from commit `8bfc478fa4`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:48 +02:00
Chris Wilson	c3f923b554	drm/i915: Don't clflush before release phys object When we teardown the backing storage for the phys object, we copy from the coherent contiguous block back to the shmemfs object, clflushing as we go. Trying to clflush the invalid sg beforehand just oops and would be redundant (due to it already being coherent, and clflushed afterwards). Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk (cherry picked from commit `e5facdf964`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:38 +02:00
Dave Airlie	4a401ceeef	Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes First set of i915 fixes for code in next. * tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: skip the first 4k of stolen memory on everything >= gen8 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping drm/i915: Fix use after free in logical_render_ring_init drm/i915: disable PSR by default on HSW/BDW drm/i915: Fix setting of boost freq tunable drm/i915: tune down the fast link training vs boot fail drm/i915: Reorder phys backing storage release drm/i915/gen9: Fix PCODE polling during SAGV disabling drm/i915/gen9: Fix PCODE polling during CDCLK change notification drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating drm/i915: drop the struct_mutex when wedged or trying to reset	2016-12-23 05:28:02 +10:00
Chris Wilson	abb0deacb5	drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping If we at first do not succeed with attempting to remap our physical pages using a coalesced scattergather list, try again with one scattergather entry per page. This should help with swiotlb as it uses a limited buffer size and only searches for contiguous chunks within its buffer aligned up to the next boundary - i.e. we may prematurely cause a failure as we are unable to utilize the unused space between large chunks and trigger an error such as: i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes) Reported-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Fixes: `871dfbd67d` ("drm/i915: Allow compaction upto SWIOTLB max segment size") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (cherry picked from commit `d766ef5300`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-20 16:30:47 +02:00
Chris Wilson	057f803ff1	drm/i915: Reorder phys backing storage release In commit `a4f5ea64f0` ("drm/i915: Refactor object page API"), I reordered the object->pages teardown to be more friendly wrt to a separate obj->mm.lock. However, I overlooked the phys object and left it with a dangling use-after-free of its phys_handle. Move the allocation of the phys handle to get_pages and it release to put_pages to prevent the invalid access and to improve symmetry. v2: Add commentary about always aligning to page size. Testcase: igt/drv_selftest/objects Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: `a4f5ea64f0` ("drm/i915: Refactor object page API") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk (cherry picked from commit `dbb4351bab`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-20 16:29:26 +02:00
Jan Kara	1a29d85eb0	mm: use vmf->address instead of of vmf->virtual_address Every single user of vmf->virtual_address typed that entry to unsigned long before doing anything with it so the type of virtual_address does not really provide us any additional safety. Just use masked vmf->address which already has the appropriate type. Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:09 -08:00
Chris Wilson	ce1135c7de	drm/i915: Complete requests in nop_submit_request Since the submit/execute split in commit `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") the global seqno advance was deferred until the submit_request callback. After wedging the GPU, we were installing a nop_submit_request handler (to avoid waking up the dead hw) but I had missed converting this over to the new scheme. Under the new scheme, we have to explicitly call i915_gem_submit_request() from the submit_request handler to mark the request as on the hardware. If we don't the request is always pending, and any waiter will continue to wait indefinitely and hangcheck will not be able to resolve the lockup. References: https://bugs.freedesktop.org/show_bug.cgi?id=98748 Testcase: igt/gem_eio/in-flight Fixes: `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk (cherry picked from commit `3dcf93f7f2`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-05 10:38:01 +02:00
Chris Wilson	05c348377d	drm/i915: Skip final clflush if LLC is coherent If the LLC is coherent with the object, we do not need to worry about whether main memory and cache mismatch when we hand the object back to the system. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-2-chris@chris-wilson.co.uk	2016-11-18 22:33:49 +00:00
Chris Wilson	a6a7cc4b7d	drm/i915: Always flush the dirty CPU cache when pinning the scanout Currently we only clflush the scanout if it is in the CPU domain. Also flush if we have a pending CPU clflush. We also want to treat the dirtyfb path similar, and flush any pending writes there as well. v2: Only send the fb flush message if flushing the dirt on flip v3: Make flush-for-flip and dirtyfb look more alike since they serve similar roles as end-of-frame marker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2 Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk	2016-11-18 22:33:22 +00:00
Chris Wilson	b17993b7b2	drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error On the DMA mapping error path, sg may be NULL (it has already been marked as the last scatterlist entry), and we should avoid dereferencing it again. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `e227330223` ("drm/i915: avoid leaking DMA mappings") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-11-18 20:51:53 +00:00
Chris Wilson	5b8c8aec8e	drm/i915: Move frontbuffer CS write tracking from ggtt vma to object I tried to avoid having to track the write for every VMA by only tracking writes to the ggtt. However, for the purposes of frontbuffer tracking this is insufficient as we need to invalidate around writes not just to the the ggtt but all aliased ppgtt views of the framebuffer. By moving the critical section to the object and only doing so for framebuffer writes we can reduce the tracking even further by only watching framebuffers and not vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-18 11:15:59 +00:00
Matthew Auld	ea84aa776f	drm/i915: don't leak global_timeline We need to clean up the global_timeline in i915_gem_load_cleanup. v2: don't forget about the struct_mutex, and also WARN_ON if we have any remaining timelines before purging the global_timeline. v3: it might be a good idea to first remove the global_timeline...duh! Fixes: `73cb97010d` ("drm/i915: Combine seqno + tracking into a global timeline struct") Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1479415087-13216-1-git-send-email-matthew.auld@intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20161117210411.14044-2-chris@chris-wilson.co.uk	2016-11-17 21:05:37 +00:00
Chris Wilson	c4c29d7b59	drm/i915: Demote i915_gem_open() debugging from DRIVER to USER We use DRM_DEBUG() when reporting on user actions, to try and keep intentional errors out of the CI dmesg. Demote the debug from i915_gem_open() similarly so that it is only apparent with drm.debug & 1 like its brethren. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-11-17 14:23:20 +00:00
Tvrtko Ursulin	275a991c03	drm/i915: dev_priv cleanup in i915_gem_gtt.c Started with removing INTEL_INFO(dev) and cascaded into a quite big trickle of function prototype changes. Still, I think it is for the better. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:56:12 +00:00
Tvrtko Ursulin	4362f4f6dd	drm/i915: Use dev_priv in INTEL_INFO in i915_gem_fence_reg.c Plus a small cascade of function prototype changes. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:56:06 +00:00
Tvrtko Ursulin	c6be607abc	drm/i915: dev_priv and a small cascade of cleanups in i915_gem.c Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:55:26 +00:00
Chris Wilson	6b5e90f58c	drm/i915/scheduler: Boost priorities for flips Boost the priority of any rendering required to show the next pageflip as we want to avoid missing the vblank by being delayed by invisible workload. We prioritise avoiding jank and jitter in the GUI over starving background tasks. v2: Descend dma_fence_array when boosting priorities. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk	2016-11-14 21:01:22 +00:00
Chris Wilson	20311bd350	drm/i915/scheduler: Execute requests in order of priorities Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk	2016-11-14 21:01:21 +00:00
Chris Wilson	52e5420907	drm/i915/scheduler: Record all dependencies upon request construction The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk	2016-11-14 21:00:28 +00:00
Chris Wilson	663f71e73f	drm/i915: Remove engine->execlist_lock The execlist_lock is now completely subsumed by the engine->timeline->lock, and so we can remove the redundant layer of locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk	2016-11-14 21:00:25 +00:00
Chris Wilson	bb89485e99	drm/i915: Create distinct lockclasses for execution vs user timelines In order to simplify the lockdep annotation, as they become more complex in the future with deferred execution and multiple paths through the same functions, create a separate lockclass for the user timeline and the hardware execution timeline. We should only ever be locking the user timeline and the execution timeline in parallel so we only need to create two lock classes, rather than a separate class for every timeline. v2: Rename the lock classes to be more consistent with other lockdep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk	2016-11-14 21:00:21 +00:00
Chris Wilson	2b3c83176e	drm/i915: Stop skipping the final clflush back to system pages When we release the shmem backing storage, we make sure that the pages are coherent with the cpu cache. However, our clflush routine was skipping the flush as the object had no pages at release time. Fix this by explicitly flushing the sg_table we are decoupling. Fixes: `03ac84f183` ("drm/i915: Pass around sg_table to get_pages/put_pages backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk	2016-11-11 16:21:52 +00:00
Chris Wilson	9caa34aa93	drm/i915: Only wait upon the execution timeline when unlocked In order to walk the list of all timelines, we currently require the struct_mutex. We are sometimes called prior to the struct_mutex being taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only trust the global execution timelines (as these are owned by the device). This means in the unlocked phase we can only wait upon the currently executing requests and not all queued. [ 175.743243] general protection fault: 0000 [#1] SMP [ 175.743263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi aesni_intel aes_x86_64 lrw snd_soc_rt5640 gf128mul snd_soc_rl6231 snd_soc_core glue_helper snd_compress snd_pcm_dmaengine snd_hda_codec_hdmi ablk_helper snd_hda_codec_realtek cryptd snd_hda_codec_generic serio_raw cfg80211 snd_hda_intel snd_hda_codec ir_lirc_codec snd_hda_core lirc_dev snd_hwdep snd_pcm lpc_ich mei_me mei snd_seq_midi shpchp snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer rc_rc6_mce acpi_als nuvoton_cir kfifo_buf rc_core snd industrialio snd_soc_sst_acpi soundcore snd_soc_sst_match i2c_designware_platform 8250_dw i2c_designware_core dw_dmac spi_pxa2xx_platform mac_hid acpi_pad parport_pc ppdev lp parport [ 175.743509] autofs4 i915 e1000e psmouse ptp pps_core xhci_pci ehci_pci ahci xhci_hcd ehci_hcd libahci video sdhci_acpi sdhci i2c_hid hid [ 175.743560] CPU: 2 PID: 2386 Comm: wtdg_monitor.sh Tainted: G U 4.9.0-rc4-nightly+ #2 [ 175.743581] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0358.2016.0606.1423 06/06/2016 [ 175.743603] task: ffff88024509ba80 task.stack: ffffc9007bd18000 [ 175.743618] RIP: 0010:[<ffffffffa01af29b>] [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915] [ 175.743660] RSP: 0000:ffffc9007bd1b9b8 EFLAGS: 00010297 [ 175.743674] RAX: ffff88024489d248 RBX: 0000000000000000 RCX: 0000000000000000 [ 175.743691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880244898000 [ 175.743708] RBP: ffffc9007bd1b9f0 R08: 0000000000000000 R09: 0000000000000001 [ 175.743724] R10: 00000028eaf42792 R11: 0000000000000001 R12: dead000000000100 [ 175.743741] R13: dead000000000148 R14: ffffc9007bd1ba5f R15: 0000000000000005 [ 175.743758] FS: 00007f2638330700(0000) GS:ffff880256d00000(0000) knlGS:0000000000000000 [ 175.743777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 175.743791] CR2: 00007f885c8cea40 CR3: 00000002416b5000 CR4: 00000000003406e0 [ 175.743808] Stack: [ 175.743816] ffff88024489d248 000000004509ba80 ffff880244898000 ffff88024509ba80 [ 175.743840] 00000000ffff8b69 ffffc9007bd1ba5f ffffc9007bd1ba5e ffffc9007bd1ba28 [ 175.743863] ffffffffa01b661d 00000000ffffffff 0000000000000000 ffff880244898000 [ 175.743886] Call Trace: [ 175.743906] [<ffffffffa01b661d>] i915_gem_shrinker_lock_uninterruptible.constprop.5+0x5d/0xc0 [i915] [ 175.743937] [<ffffffffa01b6cd0>] i915_gem_shrinker_oom+0x30/0x1b0 [i915] [ 175.743955] [<ffffffff8109ca79>] notifier_call_chain+0x49/0x70 [ 175.743971] [<ffffffff8109cd9d>] __blocking_notifier_call_chain+0x4d/0x70 [ 175.743988] [<ffffffff8109cdd6>] blocking_notifier_call_chain+0x16/0x20 [ 175.744005] [<ffffffff811885dc>] out_of_memory+0x22c/0x480 [ 175.744020] [<ffffffff81205542>] __alloc_pages_slowpath+0x851/0x8ec [ 175.744037] [<ffffffff8118ca51>] __alloc_pages_nodemask+0x2c1/0x310 [ 175.744054] [<ffffffff811d8ea8>] alloc_pages_current+0x88/0x120 [ 175.744070] [<ffffffff811833a4>] __page_cache_alloc+0xb4/0xc0 [ 175.744086] [<ffffffff811865ca>] filemap_fault+0x29a/0x500 [ 175.744101] [<ffffffff81299aa6>] ext4_filemap_fault+0x36/0x50 [ 175.744117] [<ffffffff811b3d4a>] __do_fault+0x6a/0xe0 [ 175.744131] [<ffffffff811b97ee>] handle_mm_fault+0xd0e/0x1330 [ 175.744147] [<ffffffff8106738c>] __do_page_fault+0x23c/0x4d0 [ 175.744162] [<ffffffff81067650>] do_page_fault+0x30/0x80 [ 175.744177] [<ffffffff817ffbe8>] page_fault+0x28/0x30 [ 175.744191] Code: 41 57 41 56 41 55 41 54 53 48 83 ec 10 4c 8b a7 48 52 00 00 89 75 d4 48 89 45 c8 49 39 c4 74 78 4d 8d 6c 24 48 41 bf 05 00 00 00 <49> 8b 5d 00 48 85 db 74 50 8b 83 20 01 00 00 85 c0 74 15 48 8b [ 175.744320] RIP [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915] [ 175.744351] RSP <ffffc9007bd1b9b8> Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-1-chris@chris-wilson.co.uk	2016-11-11 16:21:52 +00:00
Tvrtko Ursulin	0031fb9685	drm/i915: Assorted dev_priv cleanups A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>	2016-11-11 14:58:26 +00:00
Joonas Lahtinen	b42fe9ca0a	drm/i915: Split out i915_vma.c As a side product, had to split two other files; - i915_gem_fence_reg.h - i915_gem_object.h (only parts that needed immediate untanglement) I tried to move code in as big chunks as possible, to make review easier. i915_vma_compare was moved to a header temporarily. v2: - Use i915_gem_fence_reg.{c,h} v3: - Rebased v4: - Fix building when DEBUG_GEM is enabled by reordering a bit. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-11-11 14:34:54 +02:00
Tvrtko Ursulin	0c40ce130e	drm/i915: Trim the object sg table At the moment we allocate enough sg table entries assuming we will not be able to do any coalescing. But since in practice we most often can, and more so very effectively, this ends up wasting a lot of memory. A simple and effective way of trimming the over-allocated entries is to copy the table over to a new one allocated to the exact size. Experiments on my freshly logged and idle desktop (KDE) showed that by doing this we can save approximately 1 MiB of RAM, or when running a typical benchmark like gl_manhattan I have even seen a 6 MiB saving. More complicated techniques such as only copying the last used page and freeing the rest are left to the reader. v2: * Update commit message. * Use temporary sg_table on stack. (Chris Wilson) v3: * Commit message update. * Comment added. * Replace memcpy with copy assignment. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-10 09:29:25 +00:00
Chris Wilson	d0da48cf92	drm/i915: Remove chipset flush after cache flush We always flush the chipset prior to executing with the GPU, so we can skip the flush during ordinary domain management. This should help mitigate some of the potential performance regressions, but likely trivial, from doing the flush unconditionally before execbuf introduced in commit `dcd79934b0` ("drm/i915: Unconditionally flush any chipset buffers before execbuf") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-08 11:04:04 +00:00
Imre Deak	31ab49abde	drm/i915: Add assert for no pending GPU requests during suspend/resume in LR mode During resume we will reset the SW/HW tracking for each ring head/tail pointers and so are not prepared to replay any pending requests (as opposed to GPU reset time). Add an assert for this both to the suspend and the resume code. v2: - Check for ELSP port idle already during suspend and check !gt.awake during resume. (Chris) v3: - Move the !gt.awake check to i915_gem_resume(). v4: - s/intel_lr_engines_idle/intel_execlists_idle/ (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-4-git-send-email-imre.deak@intel.com	2016-11-07 14:48:05 +02:00
Imre Deak	0cb5670baa	drm/i915: Make sure engines are idle during GPU idling in LR mode We assume that the GPU is idle once receiving the seqno via the last request's user interrupt. In execlist mode the corresponding context completed interrupt can be delayed though and until this latter interrupt arrives we consider the request to be pending on the ELSP submit port. This can cause a problem during system suspend where this last request will be seen by the resume code as still pending. Such pending requests are normally replayed after a GPU reset, but during resume we reset both SW and HW tracking of the ring head/tail pointers, so replaying the pending request with its stale tail pointer will leave the ring in an inconsistent state. A subsequent request submission can lead then to the GPU executing from uninitialized area in the ring behind the above stale tail pointer. Fix this by making sure any pending request on the ELSP port is completed before suspending. I used a polling wait since the completion time I measured was <1ms and since normally we only need to wait during system suspend. GPU idling during runtime suspend is scheduled with a delay (currently 50-100ms) after the retirement of the last request at which point the context completed interrupt must have arrived already. The chance of this bug was increased by commit `1c777c5d1d` Author: Imre Deak <imre.deak@intel.com> Date: Wed Oct 12 17:46:37 2016 +0300 drm/i915/hsw: Fix GPU hang during resume from S3-devices state but it could happen even without the explicit GPU reset, since we disable interrupts afterwards during the suspend sequence. v2: - Do an unlocked poll-wait first. (Chris) v3-4: - s/intel_lr_engines_idle/intel_execlists_idle/ and move i915.enable_execlists check to the new helper. (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470 Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com	2016-11-07 14:48:05 +02:00
Imre Deak	93c97dc17f	drm/i915: Avoid early GPU idling due to race with new request There is a small race where a new request can be submitted and retired after the idle worker started to run which leads to idling the GPU too early. Fix this by deferring the idling to the pending instance of the worker. This scenario was pointed out by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com	2016-11-07 14:48:04 +02:00
Chris Wilson	767a222e47	drm/i915: Limit Valleyview and earlier to only using mappable scanout Valleyview appears to be limited to only scanning out from the first 512MiB of the Global GTT. Lets presume that this behaviour was inherited from the display block copied from g4x (not Ironlake) and all earlier generations are similarly affected, though testing suggests different symptoms. For simplicity, impose that these platforms must scanout from the mappable region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this catches Cherryview which does not appear to be limited to the low aperture for its scanout.) v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about limiting this workaround to the old style of display engine. v3: Update changelog to reflect testing by Ville Syrjälä v4: Include the changes to the comments as well Reported-by: Luis Botello <luis.botello.ortega@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036 Fixes: `2efb813d53` ("drm/i915: Fallback to using unmappable memory for scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com	2016-11-07 12:25:50 +00:00
Chris Wilson	0ef723cbce	drm/i915: Round tile chunks up for constructing partial VMAs When we split a large object up into chunks for GTT faulting (because we can't fit the whole object into the aperture) we have to align our cuts with the fence registers. Each partial VMA must cover a complete set of tile rows or the offset into each partial VMA is not aligned with the whole image. Currently we enforce a minimum size on each partial VMA, but this minimum size itself was not aligned to the tile row causing distortion. Reported-by: Andreas Reis <andreas.reis@gmail.com> Reported-by: Chris Clayton <chris2553@googlemail.com> Reported-by: Norbert Preining <preining@logic.at> Tested-by: Norbert Preining <preining@logic.at> Tested-by: Chris Clayton <chris2553@googlemail.com> Fixes: `03af84fe7f` ("drm/i915: Choose partial chunksize based on tile row size") Fixes: `a61007a83a` ("drm/i915: Fix partial GGTT faulting") # enabling patch Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402 Testcase: igt/gem_mmap_gtt/medium-copy-odd Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-07 11:39:59 +00:00
Chris Wilson	2c3a3f44dc	drm/i915: Fix pages pin counting around swizzle quirk commit `bc0629a767` ("drm/i915: Track pages pinned due to swizzling quirk") fixed one problem, but revealed a whole lot more. The root cause of the pin count mismatch for the swizzle quirk (for L-shaped memory on gen3/4) was that we were incrementing the pages_pin_count upon getting the backing pages but then overwriting the pages_pin_count to set it to 1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON sanitychecks, the fix is to replace the explicit atomic_set with an atomic_inc. v2: Consistently use atomics (not mix atomics and helpers) within the lowlevel get_pages routines. This makes the atomic operations much clearer. Fixes: `1233e2db19` ("drm/i915: Move object backing storage manipulation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-04 11:55:39 +00:00
Tvrtko Ursulin	a933568eb6	drm/i915: Tidy slab cache allocations We can use the preferred KMEM_CACHE helper for brevity. Also simplifiy error unwind by only setting the ENOMEM error code once. v2: Add forgotten changes. (Joonas Lahtinen) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1) Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478099699-28652-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-03 13:52:57 +00:00
Joonas Lahtinen	56cea32382	drm/i915: Unify global_list into global_link $ sed -i -r 's/\bglobal_list\b/global_link/g' .c .h Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478081764-8058-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-11-02 15:17:13 +02:00
Tvrtko Ursulin	3599a91cc8	drm/i915: Allow shrinking of userptr objects once again Commit `1bec9b0bda` ("drm/i915/shrinker: Only shmemfs objects are backed by swap") stopped considering the userptr objects in shrinker callbacks. Restore that so idle userptr objects can be discarded in order to free up memory. One change further to what was introduced in `1bec9b0bda` is to start considering userptr objects in oom but that should also be a correct thing to do. v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `1bec9b0bda` ("drm/i915/shrinker: Only shmemfs objects are backed by swap") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-01 16:35:26 +00:00
Ville Syrjälä	a26e523921	drm/i915: Pass dev_priv to IS_BROADWATER/IS_CRESTLINE Unify our approach to things by passing around dev_priv instead of dev. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477946245-14134-21-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-01 16:40:38 +02:00
Chris Wilson	548625ee8f	drm/i915: Improve lockdep tracking for obj->mm.lock The shrinker may appear to recurse into obj->mm.lock as the shrinker may be called from a direct reclaim path whilst handling get_pages. We filter out recursing on the same obj->mm.lock by inspecting obj->mm.pages, but we do want to take the lock on a second object in order to reap their pages. lockdep spots the recursion on the same lockclass and needs annotation to avoid a false positive. To keep the two paths distinct, create an enum to indicate which subclass of obj->mm.lock we are using. This removes the false positive and avoids masking real bugs. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-01 13:01:44 +00:00
Chris Wilson	db6c2b4151	drm/i915: Store the vma in an rbtree under the object With full-ppgtt one of the main bottlenecks is the lookup of the VMA underneath the object. For execbuf there is merit in having a very fast direct lookup of ctx:handle to the vma using a hashtree, but that still leaves a large number of other lookups. One way to speed up the lookup would be to use a rhashtable, but that requires extra allocations and may exhibit poor worse case behaviour. An alternative is to use an embedded rbtree, i.e. no extra allocations and deterministic behaviour, but at the slight cost of O(lgN) lookups (instead of O(1) for rhashtable). The major of such tree will be very shallow and so not much slower, and still scales much, much better than the current unsorted list. v2: Bump vma_compare() to return a long, as we return the result of comparing two pointers. References: https://bugs.freedesktop.org/show_bug.cgi?id=87726 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk	2016-11-01 13:00:40 +00:00
Chris Wilson	bc0629a767	drm/i915: Track pages pinned due to swizzling quirk If we have a tiled object and an unknown CPU swizzle pattern, we pin the pages to prevent the object from being swapped out (and us corrupting the contents as we do not know the access pattern and so cannot convert it to linear and back to tiled on reuse). This requires us to remember to drop the extra pinning when freeing the object, or else we trigger warnings about the pin leak. In commit `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker"), the object free path was deferred to a worker, but the unpinning of the quirk, along with marking the object as reclaimable, was left on the immediate path (so that if required we could reclaim the pages under memory pressure as early as possible). However, this split introduced a bug where the pages were no longer being unpinned if they were marked as unneeded. [ 231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915] [ 231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj)) [ 231.800405] Modules linked in: [ 231.800406] snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915] [ 231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G U 4.9.0-rc2-CI-CI_DRM_1780+ #1 [ 231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 231.800456] Workqueue: events __i915_gem_free_work [i915] [ 231.800459] ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000 [ 231.800465] ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000 [ 231.800469] ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0 [ 231.800474] Call Trace: [ 231.800479] [<ffffffff8142dd65>] dump_stack+0x67/0x92 [ 231.800484] [<ffffffff8107e4e6>] __warn+0xc6/0xe0 [ 231.800487] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50 [ 231.800491] [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340 [ 231.800520] [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915] [ 231.800548] [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915] [ 231.800552] [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0 [ 231.800555] [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0 [ 231.800558] [<ffffffff8109c789>] worker_thread+0x49/0x490 [ 231.800561] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0 [ 231.800563] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0 [ 231.800566] [<ffffffff810a2aab>] kthread+0xeb/0x110 [ 231.800569] [<ffffffff810a29c0>] ? kthread_park+0x60/0x60 [ 231.800573] [<ffffffff818164a7>] ret_from_fork+0x27/0x40 Moving to a separate flag for tracking the quirked pin is overkill for the bug (since we only have to interchange the two tests in i915_gem_free_object) but it does reduce a complicated test on all objects and provide a sanitycheck for uncommon code paths. Fixes: `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk	2016-11-01 10:48:41 +00:00
Chris Wilson	cb399eabc4	drm/i915: Avoid accessing request->timeline outside of its lifetime Whilst waiting on a request, we may do so without holding any locks or any guards beyond a reference to the request. In order to avoid taking locks within request deallocation, we drop references to its timeline (via the context and ppgtt) upon retirement. We should avoid chasing such pointers outside of their control, in particular we inspect the request->timeline to see if we may restore the RPS waitboost for a client. If we instead look at the engine->timeline, we will have similar behaviour on both full-ppgtt and !full-ppgtt systems and reduce the amount of reward we give towards stalling clients (i.e. only if the client stalls and the GPU is uncontended does it reclaim its boost). This restores behaviour back to pre-timelines, whilst fixing: [ 645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0 [ 645.078577] Read of size 4 by task gem_exec_schedu/28408 [ 645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64 [ 645.078724] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 645.078816] ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200 [ 645.078998] ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200 [ 645.079172] ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796 [ 645.079345] Call Trace: [ 645.079404] [<ffffffff8143d059>] dump_stack+0x68/0x9f [ 645.079467] [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70 [ 645.079534] [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0 [ 645.079601] [<ffffffff8122a244>] kasan_report+0x34/0x40 [ 645.079676] [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0 [ 645.079741] [<ffffffff81229951>] __asan_load4+0x61/0x80 [ 645.079807] [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0 [ 645.079876] [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590 [ 645.079944] [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500 [ 645.080016] [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0 [ 645.080084] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210 [ 645.080157] [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0 [ 645.080226] [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690 [ 645.080296] [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690 [ 645.080366] [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0 [ 645.080433] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640 [ 645.080508] [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0 [ 645.080603] [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120 [ 645.080670] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210 [ 645.080738] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20 [ 645.080804] [<ffffffff8120268c>] ? do_mmap+0x47c/0x580 [ 645.080871] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140 [ 645.080938] [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150 [ 645.081011] [<ffffffff81108c53>] ? up_write+0x23/0x50 [ 645.081078] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140 [ 645.081145] [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90 [ 645.081214] [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0 [ 645.082030] [<ffffffff81288408>] ? __fget+0x168/0x250 [ 645.082106] [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1 [ 645.082176] [<ffffffff81288592>] ? __fget_light+0xa2/0xc0 [ 645.082242] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70 [ 645.082309] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192 [ 645.082431] Allocated: [ 645.082480] PID = 28408 [ 645.082535] [ 645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20 [ 645.082623] [ 645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0 [ 645.082716] [ 645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0 [ 645.082817] [ 645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220 [ 645.082908] [ 645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560 [ 645.083027] [ 645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0 [ 645.083152] [ 645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640 [ 645.083243] [ 645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20 [ 645.083334] [ 645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70 [ 645.083432] [ 645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 645.083551] Freed: [ 645.083599] PID = 27629 [ 645.083648] [ 645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20 [ 645.083738] [ 645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0 [ 645.083830] [ 645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0 [ 645.083922] [ 645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170 [ 645.084021] [ 645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180 [ 645.084139] [ 645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230 [ 645.084257] [ 645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230 [ 645.084380] [ 645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630 [ 645.084500] [ 645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280 [ 645.085313] [ 645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0 [ 645.085440] [ 645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920 [ 645.085532] [ 645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840 [ 645.085622] [ 645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0 [ 645.085718] [ 645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40 [ 645.085811] Memory state around the buggy address: [ 645.085869] ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.085956] ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086138] ^ [ 645.086193] ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086283] ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb v2: Add a comment to document the hint like nature of intel_engine_last_submit() Fixes: `73cb97010d` ("drm/i915: Combine seqno + tracking into a global timeline struct") Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk	2016-11-01 10:48:40 +00:00
Chris Wilson	7d5d59e527	drm/i915: Use the full hammer when shutting down the rcu tasks To flush all call_rcu() tasks (here from i915_gem_free_object()) we need to call rcu_barrier() (not synchronize_rcu()). If we don't then we may still have objects being freed as we continue to teardown the driver - in particular, the recently released rings may race with the memory manager shutdown resulting in sporadic: [ 142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40 [ 142.217187] Memory manager not clean during takedown. [ 142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel] [ 142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1 [ 142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013 [ 142.217200] ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000 [ 142.217202] ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48 [ 142.217204] ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0 [ 142.217206] Call Trace: [ 142.217209] [<ffffffff8142dd65>] dump_stack+0x67/0x92 [ 142.217211] [<ffffffff8107e4e6>] __warn+0xc6/0xe0 [ 142.217213] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50 [ 142.217214] [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40 [ 142.217236] [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915] [ 142.217246] [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915] [ 142.217253] [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915] [ 142.217260] [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915] [ 142.217268] [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915] [ 142.217269] [<ffffffff8147d214>] pci_device_remove+0x34/0xb0 [ 142.217271] [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150 [ 142.217272] [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0 [ 142.217273] [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0 [ 142.217274] [<ffffffff8157c787>] driver_unregister+0x27/0x50 [ 142.217276] [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70 [ 142.217287] [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915] [ 142.217289] [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0 [ 142.217291] [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 142.217292] ---[ end trace 6fd164859c154772 ]--- [ 142.217505] [drm:show_leaks] ERROR node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0 [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360 [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915] [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915] [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915] [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915] [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915] [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915] [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915] [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915] [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915] [<ffffffff8147d315>] pci_device_probe+0x85/0xf0 [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430 [<ffffffff8157baee>] __driver_attach+0xde/0xe0 In particular note that the node was being poisoned as we inspected the list, a clear indication that the object is being freed as we make the assertion. v2: Don't loop, just assert that we do all the work required as that will be better at detecting further errors. Fixes: `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk	2016-11-01 09:30:08 +00:00
Chris Wilson	80b204bce8	drm/i915: Enable multiple timelines With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk	2016-10-28 20:53:57 +01:00
Chris Wilson	28176ef4cf	drm/i915: Reserve space in the global seqno during request allocation A restriction on our global seqno is that they cannot wrap, and that we cannot use the value 0. This allows us to detect when a request has not yet been submitted, its global seqno is still 0, and ensures that hardware semaphores are monotonic as required by older hardware. To meet these restrictions when we defer the assignment of the global seqno, we must check that we have an available slot in the global seqno space during request construction. If that test fails, we wait for all requests to be completed and reset the hardware back to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	65e4760e39	drm/i915: Introduce a global_seqno for each request Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	3033acab07	drm/i915: Queue the idling context switch after all other timelines Before suspend, we wait for the switch to the kernel context. In order for all the other context images to be complete upon suspend, that switch must be the last operation by the GPU (i.e. this idling request must not overtake any pending requests). To make this request execute last, we make it depend on every other inflight request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk	2016-10-28 20:53:52 +01:00
Chris Wilson	73cb97010d	drm/i915: Combine seqno + tracking into a global timeline struct Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk	2016-10-28 20:53:51 +01:00

1 2 3 4 5 ...

1610 Commits