You are not logged in.
Pages: 1
Hi folks,
from time to time a call trace occurs on my system which then makes it unusable for a couple of minutes. This time the issue occurred while Steam was running and downloading a file at 20-30Mb/s. Afterwards I quit it and copied a file from my NFS server (4Gb@90Mb/s) and had no isses as some forum entries suspect a hardware issue with similar call trace messages.
I watched the journal and found this a few times:
Dez 29 15:38:02 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:38:27 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:38:52 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:39:51 kernemonster kernel: INFO: task StreamTrans #13:41695 blocked for more than 122 seconds.
Dez 29 15:39:51 kernemonster kernel: Tainted: P OE 5.4.6-arch3-1 #1
Dez 29 15:39:51 kernemonster kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dez 29 15:39:51 kernemonster kernel: StreamTrans #13 D 0 41695 15083 0x00004080
Dez 29 15:39:51 kernemonster kernel: Call Trace:
Dez 29 15:39:51 kernemonster kernel: ? __schedule+0x2f3/0x770
Dez 29 15:39:51 kernemonster kernel: schedule+0x43/0xd0
Dez 29 15:39:51 kernemonster kernel: jbd2_log_wait_commit+0xc7/0x130 [jbd2]
Dez 29 15:39:51 kernemonster kernel: ? wait_woken+0x70/0x70
Dez 29 15:39:51 kernemonster kernel: ext4_sync_file+0x3b1/0x400 [ext4]
Dez 29 15:39:51 kernemonster kernel: do_fsync+0x38/0x70
Dez 29 15:39:51 kernemonster kernel: __x64_sys_fsync+0x10/0x20
Dez 29 15:39:51 kernemonster kernel: do_syscall_64+0x4e/0x140
Dez 29 15:39:51 kernemonster kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dez 29 15:39:51 kernemonster kernel: RIP: 0033:0x7fe85771abfb
Dez 29 15:39:51 kernemonster kernel: Code: Bad RIP value.
Dez 29 15:39:51 kernemonster kernel: RSP: 002b:00007fe83c4d2ad0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Dez 29 15:39:51 kernemonster kernel: RAX: ffffffffffffffda RBX: 00007fe835e14940 RCX: 00007fe85771abfb
Dez 29 15:39:51 kernemonster kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002a
Dez 29 15:39:51 kernemonster kernel: RBP: 00007fe83683c0e0 R08: 0000000000000000 R09: 00007fe8376427e0
Dez 29 15:39:51 kernemonster kernel: R10: fffffffffffffff0 R11: 0000000000000293 R12: 0000000000000000
Dez 29 15:39:51 kernemonster kernel: R13: 00007fe8509effc0 R14: 00007fe83683c0a0 R15: 0000000000000000
I installed smartmontools and did a short test with this result:
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.4.6-arch3-1] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2311 -
# 2 Vendor (0xff) Completed without error 00% 2306 -
# 3 Vendor (0xff) Completed without error 00% 2305 -
# 4 Vendor (0xff) Completed without error 00% 2304 -
# 5 Vendor (0xff) Completed without error 00% 2302 -
# 6 Vendor (0xff) Completed without error 00% 2301 -
# 7 Vendor (0xff) Completed without error 00% 2298 -
# 8 Vendor (0xff) Completed without error 00% 2297 -
# 9 Vendor (0xff) Completed without error 00% 2292 -
#10 Vendor (0xff) Completed without error 00% 2290 -
#11 Vendor (0xff) Completed without error 00% 2288 -
#12 Vendor (0xff) Completed without error 00% 2287 -
#13 Vendor (0xff) Completed without error 00% 2285 -
#14 Vendor (0xff) Completed without error 00% 2284 -
#15 Vendor (0xff) Completed without error 00% 2283 -
#16 Vendor (0xff) Completed without error 00% 2280 -
#17 Vendor (0xff) Completed without error 00% 2277 -
#18 Vendor (0xff) Completed without error 00% 2276 -
#19 Vendor (0xff) Completed without error 00% 2275 -
#20 Vendor (0xff) Completed without error 00% 2274 -
#21 Vendor (0xff) Completed without error 00% 2272 -
The disk is partition according to the WIKI part "LUKS on a partition".
Some bug reports mention a problem with 4.18 and Ryzen boards. I also have a Ryzen 1700 but run the current kernel (5.4.6-arch3-1)
Might this be related to Steam's way of allocating/manging the disk space? Or even worse a problem with the partitions? How can I track this down?
Thanks in advance
Offline
Some time ago I also experienced issues (system is unusable for 1-2 minutes) when writing large files (more than 700MiB-1GiB) and journal log contained traces related to jdb2 (like yours jbd2_log_wait_commit+0xc7/0x130 [jbd2]). In my case setup was complicated by lvm on top of luks, so debugging this issue was hard and I didn't reach anything.
Some bug reports mention a problem with 4.18 and Ryzen boards. I also have a Ryzen 1700 but run the current kernel (5.4.6-arch3-1)
Might this be related to Steam's way of allocating/manging the disk space? Or even worse a problem with the partitions? How can I track this down?
I don't think it is related with hardware. You can try to copy around sufficiently large files (if your filesystem is not COW)to test whether this bug occurs only with Steam.
I installed smartmontools and did a short test with this result:
Short test or long test?
Offline
Again I started to analyze the issue a little bit. In the last test I copied a 40Gb+ file over NFS and while it started to transfer at 90Mb/s after approx. 15Gb the rate decreased to 50-60Mb/s and after around 30Gb the Desktop freeze came back for a few seconds. The log during the copy process looks like this:
Okt 23 14:35:02 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:35:02 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:35:02 kernemonster tracker-extract[4217]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:35:02 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:35:02 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:35:04 kernemonster systemd[1357]: Started Application launched by gnome-shell.
Okt 23 14:35:05 kernemonster evolution-alarm[4233]: Your application did not unregister from D-Bus before destruction. Consider using g_application_run().
Okt 23 14:35:13 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:35:31 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0010d.
Okt 23 14:35:31 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c00114.
Okt 23 14:35:32 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0011b.
Okt 23 14:35:34 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c00114.
Okt 23 14:35:35 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0011b.
Okt 23 14:35:40 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:35:40 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:36:30 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:36:30 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:36:30 kernemonster tracker-extract[4448]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:36:30 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:36:30 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:36:41 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:37:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:37:11 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:37:11 kernemonster tracker-extract[4474]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:37:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:37:11 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:37:22 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:37:22 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:37:22 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:37:23 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:37:23 kernemonster unknown[4487]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files disappears >
Okt 23 14:37:23 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:37:33 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:39:43 kernemonster systemd[1357]: Started VTE child process 4557 launched by gnome-terminal-server process 2741.
Okt 23 14:39:47 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:39:47 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:39:59 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:39:59 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:39:59 kernemonster tracker-extract[4627]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:39:59 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:39:59 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:40:10 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:40:10 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:40:11 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:40:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:40:11 kernemonster unknown[4638]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files disappears >
Okt 23 14:40:11 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:40:21 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:41:43 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:41:43 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:41:43 kernemonster tracker-extract[4677]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:41:43 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:41:43 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:41:44 kernemonster systemd[1357]: vte-spawn-6e1b176d-8a38-4f46-b1df-6f9055c1859c.scope: Succeeded.
Okt 23 14:41:44 kernemonster audit[2757]: USER_END pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_unix acct="root" exe="/usr/bin/su" hostname=kernemonster addr=? ter>
Okt 23 14:41:44 kernemonster kernel: audit: type=1106 audit(1603456904.469:180): pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_unix acct="root" exe="/usr/bin/su" ho>
Okt 23 14:41:44 kernemonster kernel: audit: type=1104 audit(1603456904.469:181): pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/bin/su" hostna>
Okt 23 14:41:44 kernemonster kernel: audit: type=1106 audit(1603456904.469:182): pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root>
Okt 23 14:41:44 kernemonster kernel: audit: type=1104 audit(1603456904.469:183): pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_u2f acct="root" exe="/usr/bin/sudo" hostnam>
Okt 23 14:41:44 kernemonster audit[2757]: CRED_DISP pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/bin/su" hostname=kernemonster addr=? termin>
Okt 23 14:41:44 kernemonster audit[2754]: USER_END pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname>
Okt 23 14:41:44 kernemonster audit[2754]: CRED_DISP pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_u2f acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/>
Okt 23 14:41:44 kernemonster su[2757]: pam_unix(su:session): session closed for user root
Okt 23 14:41:44 kernemonster systemd[1357]: vte-spawn-ab4ca689-29ac-4821-8c16-6dffa7b7aadf.scope: Succeeded.
This time the call trace didn't show up (but the freezes did). It doesn't seem to be related to Steam (like assumed in the beginning) as it wasn't involved this time. The same machine didn't have issues when copying the file from Windows 10 over SMB.
The output from smartctl -t long:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.1-arch1-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
Device Model: Crucial_CT480M500SSD1
Serial Number: 1351095EAB0F
LU WWN Device Id: 5 00a075 1095eab0f
Firmware Version: MU05
User Capacity: 480.103.981.056 bytes [480 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Oct 23 16:45:24 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x85) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2235) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 37) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 1
5 Reallocate_NAND_Blk_Cnt 0x0033 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2427
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2227
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 098 098 000 Old_age Always - 75
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 222
180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 8219
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 071 043 000 Old_age Always - 29 (Min/Max 0/57)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0031 098 098 000 Pre-fail Offline - 2
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 --- Old_age Always - 13559275604
247 Host_Program_Page_Count 0x0032 100 100 --- Old_age Always - 427303352
248 FTL_Program_Page_Count 0x0032 100 100 --- Old_age Always - 1934431862
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 2427 -
# 2 Vendor (0xff) Completed without error 00% 2425 -
# 3 Vendor (0xff) Completed without error 00% 2422 -
# 4 Vendor (0xff) Completed without error 00% 2416 -
# 5 Vendor (0xff) Completed without error 00% 2414 -
# 6 Vendor (0xff) Completed without error 00% 2413 -
# 7 Vendor (0xff) Completed without error 00% 2404 -
# 8 Vendor (0xff) Completed without error 00% 2401 -
# 9 Vendor (0xff) Completed without error 00% 2400 -
#10 Vendor (0xff) Completed without error 00% 2398 -
#11 Vendor (0xff) Completed without error 00% 2396 -
#12 Vendor (0xff) Completed without error 00% 2394 -
#13 Vendor (0xff) Completed without error 00% 2390 -
#14 Vendor (0xff) Completed without error 00% 2387 -
#15 Vendor (0xff) Completed without error 00% 2386 -
#16 Vendor (0xff) Completed without error 00% 2382 -
#17 Vendor (0xff) Completed without error 00% 2379 -
#18 Vendor (0xff) Completed without error 00% 2377 -
#19 Vendor (0xff) Completed without error 00% 2366 -
#20 Vendor (0xff) Completed without error 00% 2353 -
#21 Vendor (0xff) Completed without error 00% 2352 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
How could I analyze i.e. if luks is working correctly and not having issues that don't show up in the log?
Offline
Pages: 1