linux

Commit Graph

Author	SHA1	Message	Date
Mika Kuoppala	581744626d	drm/i915: Add reason for capture in error state We capture error state not only when the GPU hangs but also on other situations as in interrupt errors and in situations where we can kick things forward without GPU reset. There will be log entry on most of these cases. But as error state capture might be only thing we have, if dmesg was not captured. Or as in GEN4 case, interrupt error can trigger error state capture without log entry, the exact reason why capture was made is hard to decipher. v2: Split out the the error code stuff to separate patch (Ben) References: https://bugs.freedesktop.org/show_bug.cgi?id=74193 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:26 +01:00
Mika Kuoppala	cb38300215	drm/i915: Add error code into error state commit `011cf577b2` Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Tue Feb 4 12:18:55 2014 +0000 drm/i915: Generate a hang error code added error code debug into dmesg. Store this also with error state to make matching dmesg logs and error states easier. As we need to have full ring state for error code generation, do full capture always, print hang message into log and then decide if we need to keep the error state. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:25 +01:00
Chris Wilson	ab0e7ff9f2	drm/i915: Record pid/comm of hanging task After finding the guilty batch and request, we can use it to find the process that submitted the batch and then add the culprit into the error state. This is a slightly different approach from Ben's in that instead of adding the extra information into the struct i915_hw_context, we use the information already captured in struct drm_file which is then referenced from the request. v2: Also capture the workaround buffer for gen2, so that we can compare its contents against the intended batch for the active request. v3: Rebase (Mika) v4: Check for null context (Chris) checkpatch warnings fixed Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4) Acked-by: Ben Widawsky <ben@bwidawsk.net> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	8d9fc7fd2d	drm/i915: Rely on accurate request tracking for finding hung batches In the past, it was possible to have multiple batches per request due to a stray signal or ENOMEM. As a result we had to scan each active object (filtered by those having the COMMAND domain) for the one that contained the ACTHD pointer. This was then made more complicated by the introduction of ppgtt, whereby ACTHD then pointed into the address space of the context and so also needed to be taken into account. This is a fairly robust approach (though the implementation is a little fragile and depends upon the per-generation setup, registers and parameters). However, due to the requirements for hangstats, we needed a robust method for associating batches with a particular request and having that we can rely upon it for finding the associated batch object for error capture. If the batch buffer tracking is not robust enough, that should become apparent quite quickly through an erroneous error capture. That should also help to make sure that the runtime reporting to userspace is robust. It also means that we then report the oldest incomplete batch on each ring, which can be useful for determining the state of userspace at the time of a hang. v2: Use i915_gem_find_active_request (Mika) v3: remove check for ring->get_seqno, split long lines (Ben) v4: check that context is available (Chris) checkpatch warnings fixed Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Ben Widawsky	011cf577b2	drm/i915: Generate a hang error code We get a large number of bugs which have a, "hey I have that too" because they see a GPU hang in dmesg. While two machines of the same model having a GPU hang is indeed a coincidence, it is far from enough evidence to suggest they are the same. In order to reduce this effect, and hopefully get people to file new bug reports, clearly the error message itself has been insufficient (see ref at the bottom for a new bug report with this characteristic). The algorithm is purposely pretty naive. I don't think we need much in order to avoid the problem I am trying to solve, and keeping it naive gives us some ability to make a decent test case. Cc: Jesse Barnes <jbarnes@virtuousgeek.org> References: https://bugs.freedesktop.org/show_bug.cgi?id=73276 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-05 17:17:10 +01:00
Chris Wilson	8b6124a633	drm/i915: Don't access snooped pages through the GTT (even for error capture) We want to use the GTT for reading back objects upon an error so that we have exactly the information that the GPU saw. However, it is verboten to access snoopable pages through the GTT and causes my PineView GPU to throw a page fault instead. This has not been a problem in the past as we only dumped ringbuffers and batchbuffers, both of which must be not snooped. However, the introduction of HWS page dumping leads to a read of a snooped object through the GTT. This was introduced by commit `f3ce382139` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 23 22:40:36 2014 +0000 drm/i915: Include HW status page in error capture Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet:s/uncached/not snooped/ for one case in the commit message as requested by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:25:39 +01:00
Chris Wilson	53a4c6b26d	drm/i915: Only print information for filing bug reports once Repeating the same information multiple times is just annoying. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:25:38 +01:00
Ben Widawsky	6c7a01ec37	drm/i915: Capture PPGTT info on error capture v2: Rebased upon cleaned up error state v3: Make sure hangcheck info remains last (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 13:01:02 +01:00
Ben Widawsky	91ec5d11ab	drm/i915: Add some more registers to error state Chris: Do we also want to capture? GAC_ECO_BITS /* gen6,7 / GAM_ECOCHK / gen6,7 / GAB_CTL / gen6 / GFX_MODE / gen6 */ Requested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 13:00:24 +01:00
Ben Widawsky	362b8af7ad	drm/i915: Move per ring error state to ring_error v2: Moved num_requests up (Chris) Rebased on new hws page capture which required a rename since it made two members named, 'hws' in the per ring error state. (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 13:00:14 +01:00
Ben Widawsky	654c90c678	drm/i915: Logically reorder error register capture Create logical sections in an attempt to clean up, and continue to keep future additions clean. v2: Reworded the comments. Added section headers (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 12:58:41 +01:00
Ben Widawsky	1d762aad7b	drm/i915: Extract register state error capture The code has become quite hairy. By relocating all the generic registers it will become more obvious where future ones should go. There is still admittedly a bit of confusion left for things like per ring registers. A subsequent patch will clean this function up. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 12:58:29 +01:00
Daniel Vetter	e515b47e56	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Backmerge drm-next - I need to backmerge drm-intel-fixes patches touching the error capture code to be able to merge Ben's cleanup patches. Conflicts: drivers/gpu/drm/i915/i915_gpu_error.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 12:56:51 +01:00
Chris Wilson	f3ce382139	drm/i915: Include HW status page in error capture Many times in the past we have concluded that the cause of the GPU hang has been that the hw status page was stale, usually because the GPU and CPU disagreed over the address of the page. Having stumbled across yet another issue that seems to be related to the HWSP, it is time to include that information in the GPU error dump. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:16:40 +01:00
Chris Wilson	372fbb8e39	drm/i915: Decouple GPU error reporting from ring initialisation Currently we report through our error state only the rings that have been initialised (as detected by ring->obj). This check is done after the GPU reset and ring re-initialisation, which means that the software state may not be the same as when we captured the hardware error and we may not print out any of the vital information for debugging the hang. This (and the implied object leak) is a regression from commit `3d57e5bd12` Author: Ben Widawsky <ben@bwidawsk.net> Date: Mon Oct 14 10:01:36 2013 -0700 drm/i915: Do a fuller init after reset Note that we are already starting to get bug reports with incomplete error states from 3.13, which also hampers debugging userspace driver issues. v2: Prevent a NULL dereference on 830gm/845g after a GPU reset where the scratch obj may be NULL. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=74094 Cc: stable@vger.kernel.org # please don't delay since it's a vital support/debug feature for the intel gfx stack in general Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: Add a bit of fluff to make it clear we need this expedited in stable.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:10:33 +01:00
Daniel Vetter	0e5539b923	Merge branch 'topic/ppgtt' into drm-intel-next-queued Because whatever.* * This should contain a fairly long list of issues and still unresolved resgressions, but I didn't really get a vote. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-25 21:14:57 +01:00
Ben Widawsky	7e0d96bc03	drm/i915: Use multiple VMs -- the point of no return As with processes which run on the CPU, the goal of multiple VMs is to provide process isolation. Specific to GEN, there is also the ability to map more objects per process (2GB each instead of 2Gb-2k total). For the most part, all the pipes have been laid, and all we need to do is remove asserts and actually start changing address spaces with the context switch. Since prior to this we've converted the setting of the page tables to a streamed version, this is quite easy. One important thing to point out (since it'd been hotly contested) is that with this patch, every context created will have it's own address space (provided the HW can do it). v2: Disable BDW on rebase NOTE: I tried to make this commit as small as possible. I needed one place where I could "turn everything on" and that is here. It could be split into finer commits, but I didn't really see much point. Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:24:52 +01:00
Ben Widawsky	d7f46fc4e7	drm/i915: Make pin count per VMA Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:49 +01:00
Ben Widawsky	685987c691	drm/i915: Identify active VM for batchbuffer capture Using the current state of the page directory registers, we can determine which of our address spaces was active when the hang occurred. This allows us to scan through all the address spaces to identify the "active" one during error capture. v2: Rebased for BDW error detection. BDW error detection is similar except instead of PP_DIR_BASE, we can use the PDP registers. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Add FIXME about global gtt misuse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:48 +01:00
Ben Widawsky	496bfcb9f1	drm/i915: Don't use gtt mapping for !gtt error objects The existing check was insufficient to determine whether we can use the GTT mapping to read out the object during error capture. The previous condition was, if the object has a GGTT mapping, and the reloc is in the GTT range... the can happen with opjects mapped into multiple vms (one of which being the GTT). There are two solutions to this problem: 1. This patch, which avoid reading the io mapping 2. Use the GGTT offset with the io mapping. Since error capture is about recording the most accurate possible error state, and the error was caused by the object not in the GGTT - I opted for the former. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:47 +01:00
Ben Widawsky	a7b910789f	drm/i915: Add vm to error BO capture formerly: drm/i915: Create VMAs (part 6) - finish error plumbing Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:47 +01:00
Ville Syrjälä	3dda20a974	drm/i915: Record BB_ADDR for every ring Every ring seems to have a BB_ADDR registers, so include them all in the error state. v2: Also include the _UDW on BDW Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-11 23:52:12 +01:00
Ville Syrjälä	0476190e10	drm/i915: Use 32bit read for BB_ADDR The BB_ADDR register is documented to be 32bits at least since SNB. Prior to that the high 32bits were listed as MBZ, so using a 64bit read doesn't seem worth anything. Also the simulator doesn't like the 64bit read. So just switch to using a 32bit read instead. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-11 23:52:12 +01:00
Ben Widawsky	d0582ed2ff	drm/i915/bdw: Update relevant error state Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:43 +01:00
Ben Widawsky	5ab31333ac	drm/i915/bdw: Fences on gen8 look just like gen7 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:36 +01:00
Chris Wilson	94e39e282e	drm/i915: Capture batchbuffer state upon GPU hang The bbstate contains useful bits of debugging information such as whether the batch is being read from GTT or PPGTT, or whether it is allowed to execute privileged instructions. v2: Only record BB_STATE for gen4+ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-30 10:37:58 +01:00
Daniel Vetter	f468980171	drm/i915: Educate users in dmesg about reporting gpu hangs Untangling me-too reports that actually aren't is really messy. And we need to make sure the blame is put where it should be right from the start ;-) v2: Improve the wording from Ben's suggestions. Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Frob the message as suggested by Paulo on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 14:49:17 +02:00
Daniel Vetter	967ad7f148	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next The conflict in intel_drv.h tripped me up a bit since a patch in dinq moves all the functions around, but another one in drm-next removes a single function. So I'ev figured backing this into a backmerge would be good. i915_dma.c is just adjacent lines changed, nothing nefarious there. Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/intel_drv.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:44:43 +02:00
Ville Syrjälä	ffbab09bf9	drm: Remove pci_vendor and pci_device from struct drm_device We can get the PCI vendor and device IDs via dev->pdev. So we can drop the duplicated information. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-10-09 15:55:33 +10:00
Chris Wilson	094f9a54e3	drm/i915: Fix __wait_seqno to use true infinite timeouts When we switched to always using a timeout in conjunction with wait_seqno, we lost the ability to detect missed interrupts. Since, we have had issues with interrupts on a number of generations, and they are required to be delivered in a timely fashion for a smooth UX, it is important that we do log errors found in the wild and prevent the display stalling for upwards of 1s every time the seqno interrupt is missed. Rather than continue to fix up the timeouts to work around the interface impedence in wait_event_*(), open code the combination of wait_event[_interruptible][_timeout], and use the exposed timer to poll for seqno should we detect a lost interrupt. v2: In order to satisfy the debug requirement of logging missed interrupts with the real world requirments of making machines work even if interrupts are hosed, we revert to polling after detecting a missed interrupt. v3: Throw in a debugfs interface to simulate broken hw not reporting interrupts. v4: s/EGAIN/EAGAIN/ (Imre) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Don't use the struct typedef in new code.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-03 20:01:30 +02:00
Chris Wilson	f56383cb9f	drm/i915: Show WT caching in debugfs Add the missing cache-level to the describe_obj() function for debug and error reporting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:23 +02:00
Daniel Vetter	a1e2265332	drm/i915: Use kcalloc more No buffer overflows here, but better safe than sorry. v2: - Fixup the sizeof conversion, I've missed the pointer deref (Jani). - Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani). - Use kmalloc_array for the execbuf fastpath to avoid the memset (Chris). I've opted to leave all other conversions as-is since they aren't in a fastpath and dealing with cleared memory instead of random garbage is just generally nicer. Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: Drop the contentious kmalloc_array hunk in execbuf.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:01 +02:00
Dave Airlie	4821ff14a3	Merge tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next drm-intel-next-2013-09-21: - clock state handling rework from Ville - l3 parity handling fixes for hsw from Ben - some more watermark improvements from Ville - ban badly behaved context from Mika - a few vlv improvements from Jesse - VGA power domain handling from Ville drm-intel-next-2013-09-06: - Basic mipi dsi support from Jani. Not yet converted over to drm_bridge since that was too fresh, but the porting is in progress already. - More vma patches from Ben, this time the code to convert the execbuffer code. Now that the shrinker recursion bug is tracked down we can move ahead here again. Yay! - Optimize hw context switching to not generate needless interrupts (Chris Wilson). Also some shuffling for the oustanding request allocation. - Opregion support for SWSCI, although not yet fully wired up (we need a bit of runtime D3 support for that apparently, due to Windows design deficiencies), from Jani Nikula. - A few smaller changes all over. [airlied: merge conflict fix in i9xx_set_pipeconf] * tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel: (119 commits) drm/i915: assume all GM45 Acer laptops use inverted backlight PWM drm/i915: cleanup a min_t() cast drm/i915: Pull intel_init_power_well() out of intel_modeset_init_hw() drm/i915: Add POWER_DOMAIN_VGA drm/i915: Refactor power well refcount inc/dec operations drm/i915: Add intel_display_power_{get, put} to request power for specific domains drm/i915: Change i915_request power well handling drm/i915: POSTING_READ IPS_CTL before waiting for the vblank drm/i915: don't disable ERR_INT on the IRQ handler drm/i915/vlv: disable rc6p and rc6pp residency reporting on BYT drm/i915/vlv: honor i915_enable_rc6 boot param on VLV drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF drm/i915: Do remaps for all contexts drm/i915: Keep a list of all contexts drm/i915: Make l3 remapping use the ring drm/i915: Add second slice l3 remapping drm/i915: Fix HSW parity test drm/i915: dump crtc timings from the pipe config drm/i915: register backlight device also when backlight class is a module drm/i915: write D_COMP using the mailbox ... Conflicts: drivers/gpu/drm/i915/intel_display.c	2013-10-01 10:00:50 +10:00
Chris Wilson	e29bb4ebbf	drm/i915: Use a temporary va_list for two-pass string handling In commit `edc3d8848d` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu May 23 13:55:35 2013 +0300 drm/i915: avoid big kmallocs on reading error state we introduce a two-pass mechanism for splitting long strings being formatted into the error-state. The first pass finds the length, and the second pass emits the right portion of the string into the accumulation buffer. Unfortunately we use the same va_list for both passes, resulting in the second pass reading garbage off the end of the argument list. As the two passes are only used for boundaries between read() calls, the corruption is only rarely seen. This fixes the root cause behind commit `baf27f9b17` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jun 29 23:26:50 2013 +0100 drm/i915: Break up the large vsnprintf() in print_error_buffers() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-24 09:36:40 +02:00
Mika Kuoppala	da66146425	drm/i915: include hangcheck action and score in error_state Score and action reveals what all the rings were doing and why hang was declared. Add idle state so that we can distinguish between waiting and idle ring. v2: - add idle as a hangcheck action - consensed hangcheck status to single line (Chris) - mark active explicitly when we are making progress (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-06 17:56:17 +02:00
Chris Wilson	0d1aacac36	drm/i915: Embed the ring->private within the struct intel_ring_buffer We now have more devices using ring->private than not, and they all want the same structure. Worse, I would like to use a scratch page from outside of intel_ringbuffer.c and so for convenience would like to reuse ring->private. Embed the object into the struct intel_ringbuffer so that we can keep the code clean. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-03 19:17:55 +02:00
Ben Widawsky	4e5aabfd31	drm/i915: Get VECS semaphore info on error Ideally we could use for_each_ring with the ring flags as I've done a couple times (http://lists.freedesktop.org/archives/intel-gfx/2013-June/029450.html). Until Daniel merges that patch though, we can just use this. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:43 +02:00
Ben Widawsky	95f5301dd8	drm/i915: Update error capture for VMs formerly: "drm/i915: Create VMAs (part 4) - Error capture" Since the active/inactive lists are per VM, we need to modify the error capture code to be aware of this, and also extend it to capture the buffers from all the VMs. For now all the code assumes only 1 VM, but it will become more generic over the next few patches. NOTE: If the number of VMs in a real world system grows significantly we'll have to focus on only capturing the guilty VM, or else it's likely there won't be enough space for error capture. v2: Squashed in the "part 6" which had dependencies on the mm_list change. Since I've moved the mm_list change to an earlier point in the series, we were able to accomplish it here and now. v3: Rebased over new error capture Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:08:37 +02:00
Ben Widawsky	ca191b1313	drm/i915: mm_list is per VMA formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:06:58 +02:00
Chris Wilson	350ec881d9	drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge MLC_LLC was never validated for Sandybridge and was superseded by a new level of cacheing for the GPU in Ivybridge. Update our names to be consistent with usage, and in the process stop setting the unwanted bit on Sandybridge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: s/BUG/WARN_ON(1) bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-06 16:35:30 +02:00
Ben Widawsky	5cef07e162	drm/i915: Move active/inactive lists to new mm Shamelessly manipulated out of Daniel :-) "When moving the lists around explain that the active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: Leave the bound list as a global one. (Chris, indirectly) v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local, since it will eventually be replaces by a vm argument. Put comment back inline, since it no longer makes sense to do otherwise. v4: Rebased on hangcheck/error state movement Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:24:32 +02:00
Mika Kuoppala	84734a049d	drm/i915: move error state to own compilation unit Move error state generation and stringification to it's own compilation unit. Sysfs also uses this so it can't be under CONFIG_DEBUG_FS This fixes a regression introduced in commit `ef86ddced7` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu Jun 6 17:38:54 2013 +0300 drm/i915: add error_state sysfs entry Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66814 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-12 18:53:13 +02:00

42 Commits