You are not logged in.

#1 2015-03-18 19:53:13

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

intel i915 and displayport mst freeze entire kernel

Hi folks,

My lenovo laptop (t440s) came with the Ultra dock that has displayport on the back. I use a displayport cable to connect to a monitor. When I connect to the dock, the monitor works just fine, but if I ever try to "xrandr --output DP2-1 --off" it or disconnect from the dock, the entire kernel hard locks. No response, image on the screen is frozen, and from another computer on the same network I can't ssh or ping.

I put drm.debug=0xf on the boot line, and then repro'd the issue (which repro's every time) and I get the following output right before the crash (see the kernel BUG line)

Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_BUSY
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_BUSY
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_MADVISE
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_BUSY
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_BUSY
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_MADVISE
Mar 18 12:13:54 nevada kernel: [drm:drm_ioctl] pid=587, dev=0xe200, auth=1, I915_GEM_THROTTLE
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] hotplug event received, stat 0x00400000, dig 0x00101210
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] digital hpd port C - long
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] Received HPD interrupt on PIN 5 - cnt: 1
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_hpd_pulse] got hpd irq on port C - long
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] hotplug event received, stat 0x00400000, dig 0x00101210
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] digital hpd port C - long
Mar 18 12:13:54 nevada kernel: [drm:intel_hpd_irq_handler] Received HPD interrupt on PIN 5 - cnt: 2
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_get_dpcd] DPCD: 12 14 c4 01 00 15 01 83 02 00 00 00 00 00 04
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_get_dpcd] Displayport TPS3 supported
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_probe_oui] Sink OUI: 000000
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_probe_oui] Branch OUI: 90cc24
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_probe_mst] Sink is MST capable
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_hpd_pulse] got hpd irq on port C - long
Mar 18 12:13:54 nevada kernel: [drm:intel_dp_hpd_pulse] MST device may have disappeared 1 vs 1
Mar 18 12:13:54 nevada kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000004c

There are no other lines after that kernel NULL pointer dereference at 0x4c

I can attach or paste the full dmesg somewhere else if anyone is interested, but I'm confused by the second to last line there, MST device may have disappeared 1 vs 1. From the code (linux-stable at tag v3.18.6):

From drivers/gpu/drm/i915/intel_dp.c:

  4559  mst_fail:
  4560          /* if we were in MST mode, and device is not there get out of MST mode */
  4561          if (intel_dp->is_mst) {
  4562                  DRM_DEBUG_KMS("MST device may have disappeared %d vs %d\n", intel_dp->is_mst, intel_dp->mst_mgr.mst_state);
  4563                  intel_dp->is_mst = false;
  4564                  drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, intel_dp->is_mst);
  4565          }
  4566  put_power:
  4567          intel_display_power_put(dev_priv, power_domain);
  4568  
  4569          return ret;
  4570  }

So it's locking up either in drm_dp_mst_topology_mgr_set_mst (which doesn't have any debug messages in it in the main path, and does capture a lock) or it's locking up in intel_display_power_put, which ALSO doesn't have debug output. They both acquire mutex's.. looking into this some more. If anyone has seen this issue with displayport, mst, and linux 3.18.6 please let me know smile

Offline

#2 2015-03-18 21:05:50

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

Ok, writing a patch with debug messages to see where this is exactly locking up hopefully...

From 05a0a4758a98f47305165befa81eb61154e15676 Mon Sep 17 00:00:00 2001
From: Jeff Mickey <j@codemac.net>
Date: Wed, 18 Mar 2015 14:00:28 -0700
Subject: [PATCH] Debugging statements for figuring out this dp mst bug

Signed-off-by: Jeff Mickey <j@codemac.net>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 27 ++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_dp.c       |  8 ++++++++
 drivers/gpu/drm/i915/intel_pm.c       |  3 +++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index f50d884..b0ff4be 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1827,13 +1827,16 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 	int ret = 0;
 	struct drm_dp_mst_branch *mstb = NULL;
 
+	DRM_DEBUG_KMS("locking mgr->lock\n");
 	mutex_lock(&mgr->lock);
-	if (mst_state == mgr->mst_state)
+	if (mst_state == mgr->mst_state) {
+		DRM_DEBUG_KMS("goto out_unlock 1\n");
 		goto out_unlock;
-
+	}
 	mgr->mst_state = mst_state;
 	/* set the device into MST mode */
 	if (mst_state) {
+		DRM_DEBUG_KMS("inside mst_state\n");
 		WARN_ON(mgr->mst_primary);
 
 		/* get dpcd info */
@@ -1849,9 +1852,11 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 		mgr->avail_slots = mgr->total_slots;
 
 		/* add initial branch device at LCT 1 */
+		DRM_DEBUG_KMS("calling drm_dp_add_mst_branch_device\n");
 		mstb = drm_dp_add_mst_branch_device(1, NULL);
 		if (mstb == NULL) {
 			ret = -ENOMEM;
+			DRM_DEBUG_KMS("goto out_unlock 2\n");
 			goto out_unlock;
 		}
 		mstb->mgr = mgr;
@@ -1864,29 +1869,35 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 			struct drm_dp_payload reset_pay;
 			reset_pay.start_slot = 0;
 			reset_pay.num_slots = 0x3f;
+			DRM_DEBUG_KMS("drm_dp_dpcd_write_payload\n");
 			drm_dp_dpcd_write_payload(mgr, 0, &reset_pay);
 		}
 
+		DRM_DEBUG_KMS("drm_dp_dpcd_writeb\n");
 		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
 					 DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
 		if (ret < 0) {
+			DRM_DEBUG_KMS("goto out_unlock 3\n");
 			goto out_unlock;
 		}
 
 
 		/* sort out guid */
+		DRM_DEBUG_KMS("drm_dp_dpcd_read\n");
 		ret = drm_dp_dpcd_read(mgr->aux, DP_GUID, mgr->guid, 16);
 		if (ret != 16) {
 			DRM_DEBUG_KMS("failed to read DP GUID %d\n", ret);
 			goto out_unlock;
 		}
 
+		DRM_DEBUG_KMS("drm_dp_validate_guid\n");
 		mgr->guid_valid = drm_dp_validate_guid(mgr, mgr->guid);
 		if (!mgr->guid_valid) {
+			DRM_DEBUG_KMS("drm_dp_dpcd_write 2\n");
 			ret = drm_dp_dpcd_write(mgr->aux, DP_GUID, mgr->guid, 16);
 			mgr->guid_valid = true;
 		}
-
+		DRM_DEBUG_KMS("queue_work\n");
 		queue_work(system_long_wq, &mgr->work);
 
 		ret = 0;
@@ -1895,6 +1906,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 		mstb = mgr->mst_primary;
 		mgr->mst_primary = NULL;
 		/* this can fail if the device is gone */
+		DRM_DEBUG_KMS("drm_dp_dpcd_writeb 2\n");
 		drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL, 0);
 		ret = 0;
 		memset(mgr->payloads, 0, mgr->max_payloads * sizeof(struct drm_dp_payload));
@@ -1904,11 +1916,16 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 	}
 
 out_unlock:
+	DRM_DEBUG_KMS("unlocking mgr->lock\n");
 	mutex_unlock(&mgr->lock);
-	if (mstb)
+	if (mstb) {
+		DRM_DEBUG_KMS("drm_dp_put_mst_branch_device 2\n");
 		drm_dp_put_mst_branch_device(mstb);
-	return ret;
+	}
+
 
+	DRM_DEBUG_KMS("returning %d\n", ret);
+	return ret;
 }
 EXPORT_SYMBOL(drm_dp_mst_topology_mgr_set_mst);
 
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 4bcd917..5a3c562 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -4523,24 +4523,29 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
 
 		if (HAS_PCH_SPLIT(dev)) {
 			if (!ibx_digital_port_connected(dev_priv, intel_dig_port))
+				DRM_DEBUG_KMS("goto mst_fail 1\n");
 				goto mst_fail;
 		} else {
 			if (g4x_digital_port_connected(dev, intel_dig_port) != 1)
+				DRM_DEBUG_KMS("goto mst_fail 2\n");
 				goto mst_fail;
 		}
 
 		if (!intel_dp_get_dpcd(intel_dp)) {
+			DRM_DEBUG_KMS("goto mst_fail 3\n");
 			goto mst_fail;
 		}
 
 		intel_dp_probe_oui(intel_dp);
 
 		if (!intel_dp_probe_mst(intel_dp))
+			DRM_DEBUG_KMS("goto mst_fail 4\n");
 			goto mst_fail;
 
 	} else {
 		if (intel_dp->is_mst) {
 			if (intel_dp_check_mst_status(intel_dp) == -EINVAL)
+				DRM_DEBUG_KMS("goto mst_fail 5\n");
 				goto mst_fail;
 		}
 
@@ -4549,6 +4554,7 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
 			 * we'll check the link status via the normal hot plug path later -
 			 * but for short hpds we should check it now
 			 */
+			DRM_DEBUG_KMS("drm_modeset_lock\n");
 			drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
 			intel_dp_check_link_status(intel_dp);
 			drm_modeset_unlock(&dev->mode_config.connection_mutex);
@@ -4566,6 +4572,8 @@ mst_fail:
 put_power:
 	intel_display_power_put(dev_priv, power_domain);
 
+	DRM_DEBUG_KMS("Returning %d as ret\n", ret);
+
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 83c7ecf..c5e6f33 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6555,6 +6555,7 @@ void intel_display_power_put(struct drm_i915_private *dev_priv,
 
 	power_domains = &dev_priv->power_domains;
 
+	DRM_DEBUG_KMS("locking power_domains->lock\n");
 	mutex_lock(&power_domains->lock);
 
 	WARN_ON(!power_domains->domain_use_count[domain]);
@@ -6570,8 +6571,10 @@ void intel_display_power_put(struct drm_i915_private *dev_priv,
 		}
 	}
 
+	DRM_DEBUG_KMS("unlocking power_domains->lock\n");
 	mutex_unlock(&power_domains->lock);
 
+	DRM_DEBUG_KMS("intel_runtime_pm_put is being called\n");
 	intel_runtime_pm_put(dev_priv);
 }
 
-- 
2.3.3

Offline

#3 2015-03-19 02:11:07

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

Hm... and now when I recompile this 3.18.6, it can't detect displayport displays at all... Do you have to compile mesa with the correct kernel installed?

Offline

#4 2015-03-20 16:06:10

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

This also happens on 3.19.2. asssss.

It's weird how just adding DRM_DEBUG_KMS lines somehow means that the second display isn't even found. Maybe I'm building the linux package incorrectly?..

Offline

#5 2015-04-12 23:07:53

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

This is also happening on 3.19.3 as well. When I add piles of extra debug statements things don't freeze as quickly, so it implies maybe some type of data access race in the i915 mst driver, relating to displayport power. I wish I had a better repro..

Offline

#6 2015-05-15 18:33:59

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

Still happening on 4.0.2 sad

Hoping other people with lenovo t440s and docks can help me out at some point.

Offline

#7 2015-05-18 20:42:55

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

I now know it is specific to displayport display power off!

When I connect my laptop to the dock - and then do xrandr --auto things work fine, but if a screen goes to sleep or if I do xrandr --output DP2-2 --off, then the kernel locks up. Turning on nmi didn't print anything interesting in the logs though hmm

Offline

#8 2015-05-28 06:50:57

Osiris
Member
Registered: 2003-01-18
Posts: 148
Website

Re: intel i915 and displayport mst freeze entire kernel

I am having the same problem and I no solution. There seems to be two versions of that dock, the older one does not support DP and DVI connected at the same time. I'm using two monitors by using the mini-HDMI connector at the laptop. However I was never experiencing any crashes.

Offline

#9 2015-06-18 01:42:45

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

Jun 17 18:33:08 nevada kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000004c
Jun 17 18:33:08 nevada kernel: IP: [<ffffffffa05a7133>] drm_dp_check_and_send_link_address+0x13/0xa0 [drm_kms_helper]
Jun 17 18:33:08 nevada kernel: PGD 0 
Jun 17 18:33:08 nevada kernel: Oops: 0000 [#1] PREEMPT SMP 
Jun 17 18:33:08 nevada kernel: Modules linked in: fuse ctr ccm xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack overlay tun nls_iso8859_1 nls_cp437 vfat fat j
Jun 17 18:33:08 nevada kernel:  drm_kms_helper cfg80211 snd_hda_controller memstick thinkpad_acpi drm snd_hda_codec wmi thermal snd_hwdep snd_pcm intel_gtt nvram snd_timer led_class i2c_algo_bit e1000e tpm_tis battery rfkill button snd ac tpm mei_me i2c_core video ptp mei pps_core s
Jun 17 18:33:08 nevada kernel: CPU: 1 PID: 548 Comm: kworker/1:3 Tainted: G           O    4.0.5-1-ARCH #1
Jun 17 18:33:08 nevada kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS GJET67WW (2.17 ) 12/10/2013
Jun 17 18:33:08 nevada kernel: Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
Jun 17 18:33:08 nevada kernel: task: ffff880309a09440 ti: ffff880309af8000 task.ti: ffff880309af8000
Jun 17 18:33:08 nevada kernel: RIP: 0010:[<ffffffffa05a7133>]  [<ffffffffa05a7133>] drm_dp_check_and_send_link_address+0x13/0xa0 [drm_kms_helper]
Jun 17 18:33:08 nevada kernel: RSP: 0018:ffff880309afbdc8  EFLAGS: 00010286
Jun 17 18:33:08 nevada kernel: RAX: ffff88031e258205 RBX: ffff88030e775300 RCX: ffff88031e253658
Jun 17 18:33:08 nevada kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003776b600
Jun 17 18:33:08 nevada kernel: RBP: ffff880309afbde8 R08: ffff88031e253640 R09: 0000000000000001
Jun 17 18:33:08 nevada kernel: R10: 0000000000000002 R11: 00000000ffff146c R12: ffff88031e253640
Jun 17 18:33:08 nevada kernel: R13: ffff88003776b600 R14: 0000000000000000 R15: ffff88003776b9d8
Jun 17 18:33:08 nevada kernel: FS:  0000000000000000(0000) GS:ffff88031e240000(0000) knlGS:0000000000000000
Jun 17 18:33:08 nevada kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 17 18:33:08 nevada kernel: CR2: 000000000000004c CR3: 000000000180b000 CR4: 00000000001407e0
Jun 17 18:33:08 nevada kernel: Stack:
Jun 17 18:33:08 nevada kernel:  ffff88030e775300 ffff88031e253640 ffff88031e258200 0000000000000000
Jun 17 18:33:08 nevada kernel:  ffff880309afbdf8 ffffffffa05a71dc ffff880309afbe48 ffffffff8108da3b
Jun 17 18:33:08 nevada kernel:  ffff88031e253640 0000000000000000 ffff88031e253658 ffff88030e775330
Jun 17 18:33:08 nevada kernel: Call Trace:
Jun 17 18:33:08 nevada kernel:  [<ffffffffa05a71dc>] drm_dp_mst_link_probe_work+0x1c/0x20 [drm_kms_helper]
Jun 17 18:33:08 nevada kernel:  [<ffffffff8108da3b>] process_one_work+0x14b/0x470
Jun 17 18:33:08 nevada kernel:  [<ffffffff8108e188>] worker_thread+0x48/0x4b0
Jun 17 18:33:08 nevada kernel:  [<ffffffff8108e140>] ? init_pwq.part.7+0x10/0x10
Jun 17 18:33:08 nevada kernel:  [<ffffffff8108e140>] ? init_pwq.part.7+0x10/0x10
Jun 17 18:33:08 nevada kernel:  [<ffffffff81093418>] kthread+0xd8/0xf0
Jun 17 18:33:08 nevada kernel:  [<ffffffff81093340>] ? kthread_worker_fn+0x170/0x170
Jun 17 18:33:08 nevada kernel:  [<ffffffff8157a4d8>] ret_from_fork+0x58/0x90
Jun 17 18:33:08 nevada kernel:  [<ffffffff81093340>] ? kthread_worker_fn+0x170/0x170
Jun 17 18:33:08 nevada kernel: Code: 94 f7 ff e9 53 fe ff ff b8 f4 ff ff ff e9 72 fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd <80> 7e 4c 00 49 89 f6 74 6c 49 8b 46 18 4d 8d 66 18 49 39 c4 48 
Jun 17 18:33:08 nevada kernel: RIP  [<ffffffffa05a7133>] drm_dp_check_and_send_link_address+0x13/0xa0 [drm_kms_helper]
Jun 17 18:33:08 nevada kernel:  RSP <ffff880309afbdc8>
Jun 17 18:33:08 nevada kernel: CR2: 000000000000004c
Jun 17 18:33:08 nevada kernel: ---[ end trace f8467360a3bcc93e ]---

Finally got the journal to record the actual panic! Hurrah! Investigating more, may file a bug on archlinux.

Offline

#10 2015-06-18 01:58:25

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

Offline

#11 2015-06-19 01:01:00

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 790
Website

Re: intel i915 and displayport mst freeze entire kernel

https://bugs.freedesktop.org/show_bug.cgi?id=89366


Testing the patch provided by the author of the displayport mst stuff! Will report back asap if it works.

Offline

Board footer

Powered by FluxBB