linux

Commit Graph

Author	SHA1	Message	Date
Ben Widawsky	b4a74e3adf	drm/i915: Use platform specific ppgtt enable Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:59 +01:00
Ben Widawsky	e3cc19957f	drm/i915: One hopeful eviction on PPGTT alloc The patch before this changed the way in which we allocate space for the PPGTT PDEs. It began carving out the PPGTT PDEs (which live in the Global GTT) from the GGTT's drm_mm. Prior to that patch, the PDEs were hidden from the drm_mm, and therefore could never fail to be allocated. In unfortunate cases, the drm_mm may be full when we want to allocate the space. This can technically occur whenever we try to allocate, which happens in two places currently. Practically, it can only really ever happen at GPU reset. Later, when we allocate more PDEs for multiple PPGTTs this will potentially even more useful. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:58 +01:00
Ben Widawsky	c8d4c0d668	drm/i915: Use drm_mm for PPGTT PDEs When PPGTT support was originally enabled, it was only designed to support 1 PPGTT. It therefore made sense to simply hide the GGTT space required to enable this from the drm_mm allocator. Since we intend to support full PPGTT, which means more than 1, and they can be created and destroyed ad hoc it will be required to use the proper allocation techniques we already have. The first step here is to make the existing single PPGTT use the allocator. The astute observer will notice that we are reserving space in the GGTT for the PDEs for the lifetime of the address space, and would be right to question whether or not this is a good idea. It does not make a difference with this current patch only the aliasing PPGTT (indeed the PDEs should still be hidden from the shrinker). For the future, we are allocating from top to bottom to avoid using the precious "gtt space" The GGTT space at that point should only be used for scanout, HW contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers (potentially) used by the kernel. Everything else should be mapped into a PPGTT. To put the consumption in more tangible terms, it takes approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no fancy stride or alignment constraints). 3/4 of the total [average] GGTT can be used for PDEs, and hopefully never touch the 1/4 that the framebuffer needs. The astute, and persistent observer might ask about the page tables which are also pinned for the address space. This waste is unfortunate. We use 2MB of memory per address space. We leave wrapping the PDEs as a real GEM object as a TODO. v2: Align PDEs to 64b in GTT Allocate the node dynamically so we can use drm_mm_put_block Now tested on IGT Allocate node at the top to avoid fragmentation (Chris) v3: Use Chris' top down allocator v4: Embed drm_mm_node into ppgtt struct (Jesse) Remove hunks which didn't belong (Jesse) v5: Don't subtract guard page since we now killed the guard page prior to this patch. (Ben) v6: Rebased and removed guard page stuff. Added a chunk to the commit message Allow adding a context to mappable region v7: Undo v3, so we can make the drm patch last in the series Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> squash: drm/i915: allow PPGTT to use mappable Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:57 +01:00
Ben Widawsky	a3d67d2396	drm/i915: PPGTT vfuncs should take a ppgtt argument Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:56 +01:00
Ben Widawsky	6f65e29aca	drm/i915: Create bind/unbind abstraction for VMAs To sum up what goes on here, we abstract the vma binding, similarly to the previous object binding. This helps for distinguishing legacy binding, versus modern binding. To keep the code churn as minimal as possible, I am leaving in insert_entries(). It serves as the per platform pte writing basically. bind_vma and insert_entries do share a lot of similarities, and I did have designs to combine the two, but as mentioned already... too much churn in an already massive patchset. What follows are the 3 commits which existed discretely in the original submissions. Upon rebasing on Broadwell support, it became clear that separation was not good, and only made for more error prone code. Below are the 3 commit messages with all their history. drm/i915: Add bind/unbind object functions to VMA drm/i915: Use the new vm [un]bind functions drm/i915: reduce vm->insert_entries() usage drm/i915: Add bind/unbind object functions to VMA As we plumb the code with more VM information, it has become more obvious that the easiest way to deal with bind and unbind is to simply put the function pointers in the vm, and let those choose the correct way to handle the page table updates. This change allows many places in the code to simply be vm->bind, and not have to worry about distinguishing PPGTT vs GGTT. Notice that this patch has no impact on functionality. I've decided to save the actual change until the next patch because I think it's easier to review that way. I'm happy to squash the two, or let Daniel do it on merge. v2: Make ggtt handle the quirky aliasing ppgtt Add flags to bind object to support above Don't ever call bind/unbind directly for PPGTT until we have real, full PPGTT (use NULLs to assert this) Make sure we rebind the ggtt if there already is a ggtt binding. This happens on set cache levels. Use VMA for bind/unbind (Daniel, Ben) v3: Reorganize ggtt_vma_bind to be more concise and easier to read (Ville). Change logic in unbind to only unbind ggtt when there is a global mapping, and to remove a redundant check if the aliasing ppgtt exists. v4: Make the bind function a bit smarter about the cache levels to avoid unnecessary multiple remaps. "I accept it is a wart, I think unifying the pin_vma / bind_vma could be unified later" (Chris) Removed the git notes, and put version info here. (Daniel) v5: Update the comment to not suck (Chris) v6: Move bind/unbind to the VMA. It makes more sense in the VMA structure (always has, but I was previously lazy). With this change, it will allow us to keep a distinct insert_entries. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: Use the new vm [un]bind functions Building on the last patch which created the new function pointers in the VM for bind/unbind, here we actually put those new function pointers to use. Split out as a separate patch to aid in review. I'm fine with squashing into the previous patch if people request it. v2: Updated to address the smart ggtt which can do aliasing as needed Make sure we bind to global gtt when mappable and fenceable. I thought we could get away without this initialy, but we cannot. v3: Make the global GTT binding explicitly use the ggtt VM for bind_vma(). While at it, use the new ggtt_vma helper (Chris) At this point the original mailing list thread diverges. ie. v4^: use target_obj instead of obj for gen6 relocate_entry vma->bind_vma() can be called safely during pin. So simply do that instead of the complicated conditionals. Don't restore PPGTT bound objects on resume path Bug fix in resume path for globally bound Bos Properly handle secure dispatch Rebased on vma bind/unbind conversion Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: reduce vm->insert_entries() usage FKA: drm/i915: eliminate vm->insert_entries() With bind/unbind function pointers in place, we no longer need insert_entries. We could, and want, to remove clear_range, however it's not totally easy at this point. Since it's used in a couple of place still that don't only deal in objects: setup, ppgtt init, and restore gtt mappings. v2: Don't actually remove insert_entries, just limit its usage. It will be useful when we introduce gen8. It will always be called from the vma bind/unbind. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:50 +01:00
Ben Widawsky	e178f7057b	drm/i915: Provide PDP updates via MMIO The initial implementation of this function used MMIO to write the PDPs. Upon review it was determined (correctly) that the docs say to use LRI. The issue is there are times where we want to do a synchronous write (GPU reset). I've tested this, and it works. I've verified with as many people as possible that it should work. This should fix the failing reset problems. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:43 +01:00
Ben Widawsky	5ed1678206	drm/i915: Move the gtt mm takedown to cleanup Our VM code already has a cleanup function, and this is a nice place to put the drm_mm_takedown. This should have no functional impact, it just leaves the unload function a bit cleaer, and is more logical IMO Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-26 10:21:27 +01:00
Ben Widawsky	686e1f6f87	drm/i915: Add a few missed bits to the mm This should really have been added in BDW integration, as well as: commit `93bd8649db` Author: Ben Widawsky <ben@bwidawsk.net> Date: Tue Jul 16 16:50:06 2013 -0700 drm/i915: Put the mm in the parent address space It didn't really matter before, but it will in the future. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-26 10:06:22 +01:00
Ben Widawsky	d595bd4bbd	drm/i915: Fix BDW PPGTT error path When we fail for some reason on loading the PDPs, it would be wise to disable the PPGTT in the ring registers. If we do not do this, we have undefined results. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-26 09:59:05 +01:00
Daniel Vetter	c09cd6e969	Merge branch 'backlight-rework' into drm-intel-next-queued Pull in Jani's backlight rework branch. This was merged through a separate branch to be able to sort out the Broadwell conflicts properly before pulling it into the main development branch. Conflicts: drivers/gpu/drm/i915/intel_display.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-15 10:02:39 +01:00
Ben Widawsky	3a2ffb65ee	drm/i915/bdw: Limit GTT to 2GB Because of the way in which we're allocating the pages for the Aliasing PPGTT, we cannot actually successfully alloc enough space for anything greater than 2GB. Instead of a quick hack to fix this, we should defer until we have the real solution in place (allocating much less contiguous space). This wasn't found sooner because we didn't not have any systems supporting more than a 2GB GTT. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-14 09:33:11 +01:00
Ben Widawsky	230f955f73	drm/i915/bdw: Free correct number of ppgtt pages I am unclear how this got messed up in the shuffle, but it did. Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-14 09:33:10 +01:00
Jesse Barnes	b53c8c3577	drm/i915: drop duplicate ggtt vma list add in setup_global_gtt Preallocated objects will already have been added to the vma_list when creating their ggtt vma entry, and coincidentally also marked as holding a ggtt mapping. Repeating the vma_list manipulation when setting up the ggtt after preallocation is a recipe for an unhappy kernel. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Use the improve commit message suggest by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-13 01:05:01 +01:00
Ville Syrjälä	b42218c19f	drm/i915/bdw: Don't muck with gtt_size on Gen8 when PPGTT setup fails v2: Resolve rebase conflicts and switch to gen < 8 color for GenX checking. v3: Rebase on top of the address space refactoring. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:49 +01:00
Ben Widawsky	28cf541543	drm/i915/bdw: unleash PPGTT v2: Squash in fix from Ben: Set PPGTT batches as necessary This fixes the regression in the last couple of days when we enabled PPGTT. v3: Squash in fixup to still use GTT for secure batches from Ville: BDW doesn't have a separate secure vs. non-secure bit in MI_BATCH_BUFFER_START. So for secure batches we have to simply leave the PPGTT bit unset. Fortunately older generations (except HSW) had similar limitations so execbuffer already creates a GTT mapping for all secure batches. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:48 +01:00
Ben Widawsky	94e409c144	drm/i915/bdw: Implement PPGTT enable Legacy PPGTT on GEN8 requires programming 4 PDP registers per ring. Since all rings are using the same address space with the current code the logic is simply to program all the tables we've setup for the PPGTT. v2: Turn on PPGTT in GFX_MODE v3: v2 was the wrong patch v4: Resolve conflicts due to patch series reordering. v5: Squash in fixup from Ben: Use LRI to write PDPs The docs (and simulator seems to back up) suggest that we can only program legacy PPGTT PDPs with LRI commands. v6: Rebase around context differences conflicts. v7: Use #defines for per ring PDPs. (Damien) v8: Don't use typede'f private_t. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (up to v3 and v7) Reviewed-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:47 +01:00
Ben Widawsky	9df15b499b	drm/i915/bdw: Implement PPGTT insert GEN8 insertion is very similar to GEN6. v2: Rebase on top of Imre's for_each_sg_page helpers. v3: Fixup my conversion (spotted by Ville). v4: Rebase on top of the address space refactoring. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:47 +01:00
Ben Widawsky	459108b8cc	drm/i915/bdw: Implement PPGTT clear range GEN8 PPGTT range clearing is very similar to GEN6 if we assume that our PDEs are all valid, which they should be. v2: Rebase on top of the address space refactoring. v3: Rebase on top of the bool use_scratch addition to the clear_range interface. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:46 +01:00
Ben Widawsky	b1fe667329	drm/i915/bdw: Initialize the PDEs The upcoming clear and insert routines will expect that PDEs all point to valid Page Directories. Doing that lazily doesn't really buy us anything. The page allocation is done regardless earlier in init so it shouldn't hurt set the PDEs. v2: Squash in patches to implement fixed PDE write function: - If I had done this in the first place, the bug that's going to be fixed in an upcoming patch would have been much easier to find. - Use WB for PDEs. The PAT bit is used for page size. 2ME PDEs aren't even supported in BDW, so this was completely invalid. The solution is to make our PDEs WB+LLC instead of the pervious WB+eLLC. As far as I can guess, this change won't matter for performance. Thanks to Ville for the quick correction when discussing on IRC. v3: Return the pde type for pde encoding (Damien) Reviewed-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:46 +01:00
Ben Widawsky	37aca44ad5	drm/i915/bdw: PPGTT init & cleanup Aside from the potential size increase of the PPGTT, the primary difference from previous hardware is the Page Directories are no longer carved out of the Global GTT. Note that the PDE allocation is done as a 8MB contiguous allocation, this needs to be eventually fixed (since driver reloading will be a pain otherwise). Also, this will be a no-go for real PPGTT support. v2: Move vtable initialization v3: Resolve conflicts due to patch series reordering. v4: Rebase on top of the address space refactoring of the PPGTT support. Drop Imre's r-b tag for v2, too outdated by now. v5: Free the correct amount of memory, "get_order takes size not a page count." (Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:45 +01:00
Ben Widawsky	fbe5d36e77	drm/i915/bdw: Support BDW caching BDW caching works differently than the previous generations. Instead of having bits in the PTE which directly control how the page is cached, the 3 PTE bits PWT PCD and PAT provide an index into a PAT defined by register 0x40e0. This style of caching is functionally equivalent to how it works on HSW and before. v2: Tiny bikeshed as discussed on internal irc. v3: Squash in patch from Ville to mirror the x86 PAT setup more like in arch/x86/mm/pat.c. Primarily, the 0th index will be WB, and not uncached. v4: Comment for reason to not use a 64b write on the PPAT. v5: Add a FIXME comment that the caching bits in the PAT registers might be wrong due to doc confusion. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:45 +01:00
Ben Widawsky	94ec8f6130	drm/i915/bdw: Add GTT functions With the PTE clarifications, the bind and clear functions can now be added for gen8. v2: Use for_each_sg_pages in gen8_ggtt_insert_entries. v3: Drop dev argument to pte encode functions, upstream lost it. Also rebase on top of the scratch page movement. v4: Rebase on top of the new address space vfuncs. v5: Add the bool use_scratch argument to clear_range and the bool valid argument to the PTE encode function to follow upstream changes. v6: Add a FIXME(BDW) about the size mismatch of the readback check that Jon Bloomfield spotted. v7: Squash in fixup patch from Ben for the posting read to match the 64bit ptes and so shut up the WARN. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:44 +01:00
Ben Widawsky	d31eb10e6c	drm/i915/bdw: Create gen8_gtt_pte_t With gen6 PTE type in place, pave the way for the new gen8 type. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:44 +01:00
Ben Widawsky	63340133f3	drm/i915/bdw: Make gen8_gmch_probe Probing gen8 is similar to gen6. To make the code cleaner and more maintainable however we can use the probe functions to split it out. v2: Rebased on top of update gtt probe infrastructure. v3: Rebased on top of Kenneth' Graunke's ->pte_encode refactoring. V4: Resolve conflicts with Ben's latest ppgtt patches, also switch to gen < 8 testing instead of gen <= 7. v5: Resolve conflicts with address space vfunc changes in upstream. v6: Use 39b DMA mask. At least, for this mode, it is the correct mask. (Imre) Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:43 +01:00
Ben Widawsky	9459d25237	drm/i915/bdw: support GMS and GGMS changes All the BARs have the ability to grow. v2: Pulled out the simulator workaround to a separate patch. Rebased. v3: Rebase onto latest vlv patches from Jesse. v4: Rebased on top of the early stolen quirk patch from Jesse. v5: Use the new macro names. s/INTEL_BDW_PCI_IDS_D/INTEL_BDW_D_IDS s/INTEL_BDW_PCI_IDS_M/INTEL_BDW_M_IDS It's Jesse's fault for not following the convention I originally set. Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:39 +01:00
Daniel Vetter	8fe6bd239a	drm/i915/bdw: Disable PPGTT for now This will be changed once the gen8 code is fully implemented. v2: Use ENOSYS instead of ENXIO as suggested by Chris. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:35 +01:00
Daniel Vetter	7f16e5c141	Merge tag 'v3.12' into drm-intel-next I want to merge in the new Broadwell support as a late hw enabling pull request. But since the internal branch was based upon our drm-intel-nightly integration branch I need to resolve all the oustanding conflicts in drm/i915 with a backmerge to make the 60+ patches apply properly. We'll propably have some fun because Linus will come up with a slightly different merge solution. Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/i915_drv.c drivers/gpu/drm/i915/intel_crt.c drivers/gpu/drm/i915/intel_ddi.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_dp.c drivers/gpu/drm/i915/intel_drv.h All rather simple adjacent lines changed or partial backports from -next to -fixes, with the exception of the thaw code in i915_dma.c. That one needed a bit of shuffling to restore the intent. Oh and the massive header file reordering in intel_drv.h is a bit trouble. But not much. v2: Also don't forget the fixup for the silent conflict that results in compile fail ... Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-04 16:28:52 +01:00
Ben Widawsky	828c79087c	drm/i915: Disable GGTT PTEs on GEN6+ suspend Once the machine gets to a certain point in the suspend process, we expect the GPU to be idle. If it is not, we might corrupt memory. Empirically (with an early version of this patch) we have seen this is not the case. We cannot currently explain why the latent GPU writes occur. In the technical sense, this patch is a workaround in that we have an issue we can't explain, and the patch indirectly solves the issue. However, it's really better than a workaround because we understand why it works, and it really should be a safe thing to do in all cases. The noticeable effect other than the debug messages would be an increase in the suspend time. I have not measure how expensive it actually is. I think it would be good to spend further time to root cause why we're seeing these latent writes, but it shouldn't preclude preventing the fallout. NOTE: It should be safe (and makes some sense IMO) to also keep the VALID bit unset on resume when we clear_range(). I've opted not to do this as properly clearing those bits at some later point would be extra work. v2: Fix bugzilla link Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321 Tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Tested-By: Todd Previte <tprevite@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-18 15:44:47 +02:00
Ben Widawsky	b35b380ed4	drm/i915: Make PTE valid encoding optional We need this to work around a corruption when the boot kernel image loads the hibernated kernel image from swap on Haswell systems - somehow not everything is properly shut off. This is just the prep work, the next patch will implement the actual workaround. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Add a commit message suitable for -fixes and add cc: stable] Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-18 15:40:21 +02:00
Daniel Vetter	a1e2265332	drm/i915: Use kcalloc more No buffer overflows here, but better safe than sorry. v2: - Fixup the sizeof conversion, I've missed the pointer deref (Jani). - Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani). - Use kmalloc_array for the execbuf fastpath to avoid the memset (Chris). I've opted to leave all other conversions as-is since they aren't in a fastpath and dealing with cleared memory instead of random garbage is just generally nicer. Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: Drop the contentious kmalloc_array hunk in execbuf.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:01 +02:00
Chris Wilson	651d794fae	drm/i915: Use Write-Through cacheing for the display plane on Iris Haswell GT3e has the unique feature of supporting Write-Through cacheing of objects within the eLLC/LLC. The purpose of this is to enable the display plane to remain coherent whilst objects lie resident in the eLLC/LLC - so that we, in theory, get the best of both worlds, perfect display and fast access. However, we still need to be careful as the CPU does not see the WT when accessing the cache. In particular, this means that we need to flush the cache lines after writing to an object through the CPU, and on transitioning from a cached state to WT. v2: Actually do the clflush on transition to WT, nagging by Ville. v3: Flush the CPU cache after writes into WT objects. v4: Rease onto LLC updates and report WT as "uncached" for get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:38 +02:00
Chris Wilson	2c22569bba	drm/i915: Update rules for writing through the LLC with the cpu As mentioned in the previous commit, reads and writes from both the CPU and GPU go through the LLC. This gives us coherency between the CPU and GPU irrespective of the attribute settings either device sets. We can use to avoid having to clflush even uncached memory. Except for the scanout. The scanout resides within another functional block that does not use the LLC but reads directly from main memory. So in order to maintain coherency with the scanout, writes to uncached memory must be flushed. In order to optimize writes elsewhere, we start tracking whether an framebuffer is attached to an object. v2: Use pin_display tracking rather than fb_count (to ensure we flush cursors as well etc) and only force the clflush along explicit writes to the scanout paths (i.e. pin_to_display_plane and pwrite into scanout). v3: Force the flush after hitting the slowpath in pwrite, as after dropping the lock the object's cache domain may be invalidated. (Ville) Based on a patch by Ville Syrjälä. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-10 11:20:49 +02:00
Chris Wilson	350ec881d9	drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge MLC_LLC was never validated for Sandybridge and was superseded by a new level of cacheing for the GPU in Ivybridge. Update our names to be consistent with usage, and in the process stop setting the unwanted bit on Sandybridge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: s/BUG/WARN_ON(1) bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-06 16:35:30 +02:00
Ben Widawsky	40d74980d3	drm/i915: Use ggtt_vm to save some typing Just some small cleanups, and a rename of vm->ggtt_vm requested by Daniel. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:10 +02:00
Ben Widawsky	a70a3148b0	drm/i915: Make proper functions for VMs Earlier in the conversion sequence we attempted to quickly wedge in the transitional interface as static inlines. Now that we're sure these interfaces are sane, for easier debug and to decrease code size (since many of these functions may be called quite a bit), make them real functions While at it, kill off the set_color interface. We'll always have the VMA, or easily get to it. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:08 +02:00
Ben Widawsky	87a6b688cc	drm/i915/hsw: Change default LLC age to 3 The default LLC age was changed: commit `0d8ff15e9a` Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Thu Jul 4 11:02:03 2013 -0700 drm/i915/hsw: Set correct Haswell PTE encodings. On the surface it would seem setting a default age wouldn't matter because all GEM BOs are aged similarly, so the order in which objects are evicted would not be subject to aging. The current working theory as to why this caused a regression though is that LLC is a bit special in that it is shared with the CPU. Presumably (not verified) the CPU fetches cachelines with age 3, and therefore recently cached GPU objects would be evicted before similar CPU object first when the LLC is full. It stands to reason therefore that this would negatively impact CPU bound benchmarks - but those seem to be low on the priority list. eLLC OTOH does not have this same property as LLC. It should be used entirely for the GPU, and so the age really shouldn't matter. Furthermore, we have no evidence to suggest one is better than another on eLLC. Since we've never properly supported eLLC before no, there should be no regression. If the GPU client really wants "younger" objects, they should use MOCS. v2: Drop the extra #define (Chad) v3: Actually git add v4: Pimped commit message Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67062 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chad Versace <chad.versace@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:06 +02:00
Chris Wilson	08c45263a6	drm/i915: Use the same pte_encoding for ppgtt as for gtt The PTE layouts are the same for both ppgtt and gtt, so we can simplify the setup for ppgtt by copying the encoding function pointer from gtt. This prevents bugs where we update one function pointer, but forget the other. For instance, commit `4d15c145a6` Author: Ben Widawsky <ben@bwidawsk.net> Date: Thu Jul 4 11:02:06 2013 -0700 drm/i915: Use eLLC/LLC by default when available only extends the gtt to use eLLC/LLC cacheing and forgets to also update the ppgtt function pointer. v2: Actually mention the bug being fixed (Kenneth) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-04 21:29:57 +02:00
Ben Widawsky	2f63315692	drm/i915: Create VMAs Formerly: "drm/i915: Create VMAs (part 1)" In a previous patch, the notion of a VM was introduced. A VMA describes an area of part of the VM address space. A VMA is similar to the concept in the linux mm. However, instead of representing regular memory, a VMA is backed by a GEM BO. There may be many VMAs for a given object, one for each VM the object is to be used in. This may occur through flink, dma-buf, or a number of other transient states. Currently the code depends on only 1 VMA per object, for the global GTT (and aliasing PPGTT). The following patches will address this and make the rest of the infrastructure more suited v2: s/i915_obj/i915_gem_obj (Chris) v3: Only move an object to the now global unbound list if there are no more VMAs for the object which are bound into a VM (ie. the list is empty). v4: killed obj->gtt_space some reworks due to rebase v5: Free vma on error path (Imre) v6: Another missed vma free in i915_gem_object_bind_to_gtt error path (Imre) Fixed vma freeing in stolen preallocation (Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Squash in fixup from Ben to not deref a non-existing vma in set_cache_level, reported by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-18 08:46:13 +02:00
Ben Widawsky	93bd8649db	drm/i915: Put the mm in the parent address space Every address space should support object allocation. It therefore makes sense to have the allocator be part of the "superclass" which GGTT and PPGTT will derive. Since our maximum address space size is only 2GB we're not yet able to avoid doing allocation/eviction; but we'd hope one day this becomes almost irrelvant. v2: Rebased Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:23:43 +02:00
Ben Widawsky	853ba5d223	drm/i915: Move gtt and ppgtt under address space umbrella The GTT and PPGTT can be thought of more generally as GPU address spaces. Many of their actions (insert entries), state (LRU lists), and many of their characteristics (size) can be shared. Do that. The change itself doesn't actually impact most of the VMA/VM rework coming up, it just fits in with the grand scheme of abstracting the GPU VM operations. GGTT will usually be a special case where we either know an object must be in the GGTT (dislay engine, workarounds, etc.). The scratch page is left as part of the VM (even though it's currently shared with the ppgtt code) because in the future when we have Full PPGTT, I intend to create a separate scratch page for each. v2: Drop usage of i915_gtt_vm (Daniel) Make cleanup also part of the parent class (Ben) Modified commit msg Rebased v3: Properly share scratch page (Imre) Finish commit message (Daniel, Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:21:47 +02:00
Ben Widawsky	4d15c145a6	drm/i915: Use eLLC/LLC by default when available DRI clients really should be using MOCS to get fine grained streaming cache controls. With that note, I hope that this patch doesn't improve performance overwhelmingly, because if it does - it means there is a problem elsewhere. In any case, the kernel, and old userspace should get some benefit from this, so let's do it. eLLC is always a good default, and really not using it is the special case for MOCS. References: http://www.intel.com/newsroom/kits/restricted/ha$well!/pdfs/4th_Gen_Intel_Core_PressBriefing_5-29.pdf (page 57) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 08:08:39 +02:00
Ben Widawsky	0d8ff15e9a	drm/i915/hsw: Set correct Haswell PTE encodings. The cacheability controls have changed, and the bits have been rearranged in general. Note that age 0 is the oldest (most likely to get evicted) and age 3 is the youngest (most likely to stick around for a bit). We've picked 0 for no reason, but atm it shouldn't matter anyway (since we don't yet try to differentiate between different objects). v2: Remove comments for snb/ivb cache leves, that's a separate change. v3: Resolve conflicts due to patch series reordering. v4: Rebased on top of Kenneth Graunke's ->pte_encode refactoring. v5: Removed eLLC bits for separate patch. In the internal repository this was: Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Add comment about cache ages as requested by Ben provoked due to a question from Damien.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 07:57:42 +02:00
Ben Widawsky	c6cfb32567	drm/i915: Embed drm_mm_node in i915 gem obj Embedding the node in the obj is more natural in the transition to VMAs which will also have embedded nodes. This change also helps transition away from put_block to remove node. Though it's quite an uncommon occurrence, it's somewhat convenient to not fail at bind time because we cannot allocate the node. Though in practice there are other allocations (like the request structure) which would probably make this point not terribly useful. Quoting Daniel: Note that the only difference between put_block and remove_node is that the former fills up the preallocation cache. Which we don't need anyway and hence is just wasted space. v2: Clean up the stolen preallocation code. Rebased on the reserve_node patches renames ggtt_ stuff to gtt_ stuff WARN_ON if the object is already bound (which doesn't mean it's in the bound list, tricky) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:36 +02:00
Ben Widawsky	edd41a870f	drm/i915: Kill obj->gtt_offset With the getters in place from the previous patch this members serves no purpose other than saving one spare pointer chase, which will be killed in the next patch anyway. Moving to VMAs, this members adds unnecessary confusion since an object may exist at different offsets in different VMs. v2: Properly preserve the stolen offset. This code is a bit hacky but it all goes away when we embed the drm_mm_node and removes the need for the incorrect patch I submitted previously: "Use gtt_space->start for stolen reservation" Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:35 +02:00
Ben Widawsky	f343c5f647	drm/i915: Getter/setter for object attributes Soon we want to gut a lot of our existing assumptions how many address spaces an object can live in, and in doing so, embed the drm_mm_node in the object (and later the VMA). It's possible in the future we'll want to add more getter/setter methods, but for now this is enough to enable the VMAs. v2: Reworked commit message (Ben) Added comments to the main functions (Ben) sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/.[ch] (Daniel) v3: Rebased on new reserve_node patch Changed DRM_DEBUG_KMS to actually work (will need fixing later) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:34 +02:00
Ben Widawsky	338710e7af	drm: Change create block to reserve node With the previous patch we no longer actually create a node, we simply find the correct hole and occupy it. This very well could have been squashed with the last patch, but since I already had David's review, I figured it's easiest to keep it distinct. Also update the users in i915. Conveniently this is the only user of the interface. CC: David Airlie <airlied@linux.ie> CC: <dri-devel@lists.freedesktop.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: David Airlie <airlied@linux.ie> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:33 +02:00
Ben Widawsky	b3a070cccb	drm: pre allocate node for create_block For an upcoming patch where we introduce the i915 VMA, it's ideal to have the drm_mm_node as part of the VMA struct (ie. it's pre-allocated). Part of the conversion to VMAs is to kill off obj->gtt_space. Doing this will break a bunch of code, but amongst them are 2 callers of drm_mm_create_block(), both related to stolen memory. It also allows us to embed the drm_mm_node into the object currently which provides a nice transition over to the new code. v2: Reordered to do before ripping out obj->gtt_offset. Some minor cleanups made available because of reordering. v3: s/continue/break on failed stolen node allocation (David) Set obj->gtt_space on failed node allocation (David) Only unref stolen (fix double free) on failed create_stolen (David) Free node, and NULL it in failed create_stolen (David) Add back accidentally removed newline (David) CC: <dri-devel@lists.freedesktop.org> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: David Airlie <airlied@linux.ie> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:32 +02:00
Ben Widawsky	b2f21b4dfd	drm/i915: Use gtt shortform where possible Just for compactness. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:27:59 +02:00
Ben Widawsky	80a74f7f9c	drm/i915: Drop dev from pte_encode The original pte_encode function needed the dev argument so we could do platform specific handling via IS_GENX, etc. With the merging of a pte encoding function there should never been a need to quirk away gen specific details. The patch doesn't do much but makes the upcoming reworks in gtt/ppgtt/mm slightly (albeit, ever so) easier. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:27:59 +02:00
Ben Widawsky	6716724006	drm/i915: Combine scratch members into a struct There isn't any special reason to do this other than it makes it obvious that the two members are connected. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:27:58 +02:00

1 2 3 4

155 Commits