You are not logged in.

#1 2019-12-29 17:58:03

two_penguins
Member
Registered: 2019-12-29
Posts: 13

Call trace related to ext4

Hi folks,

from time to time a call trace occurs on my system which then makes it unusable for a couple of minutes. This time the issue occurred while Steam was running and downloading a file at 20-30Mb/s. Afterwards I quit it and copied a file from my NFS server (4Gb@90Mb/s) and had no isses as some forum entries suspect a hardware issue with similar call trace messages.
I watched the journal and found this a few times:

Dez 29 15:38:02 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:38:27 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:38:52 kernemonster evolution[24937]: failed to commit changes to dconf: Zeitüberschreitung wurde erreicht
Dez 29 15:39:51 kernemonster kernel: INFO: task StreamTrans #13:41695 blocked for more than 122 seconds.
Dez 29 15:39:51 kernemonster kernel:       Tainted: P           OE     5.4.6-arch3-1 #1
Dez 29 15:39:51 kernemonster kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dez 29 15:39:51 kernemonster kernel: StreamTrans #13 D    0 41695  15083 0x00004080
Dez 29 15:39:51 kernemonster kernel: Call Trace:
Dez 29 15:39:51 kernemonster kernel:  ? __schedule+0x2f3/0x770
Dez 29 15:39:51 kernemonster kernel:  schedule+0x43/0xd0
Dez 29 15:39:51 kernemonster kernel:  jbd2_log_wait_commit+0xc7/0x130 [jbd2]
Dez 29 15:39:51 kernemonster kernel:  ? wait_woken+0x70/0x70
Dez 29 15:39:51 kernemonster kernel:  ext4_sync_file+0x3b1/0x400 [ext4]
Dez 29 15:39:51 kernemonster kernel:  do_fsync+0x38/0x70
Dez 29 15:39:51 kernemonster kernel:  __x64_sys_fsync+0x10/0x20
Dez 29 15:39:51 kernemonster kernel:  do_syscall_64+0x4e/0x140
Dez 29 15:39:51 kernemonster kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dez 29 15:39:51 kernemonster kernel: RIP: 0033:0x7fe85771abfb
Dez 29 15:39:51 kernemonster kernel: Code: Bad RIP value.
Dez 29 15:39:51 kernemonster kernel: RSP: 002b:00007fe83c4d2ad0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Dez 29 15:39:51 kernemonster kernel: RAX: ffffffffffffffda RBX: 00007fe835e14940 RCX: 00007fe85771abfb
Dez 29 15:39:51 kernemonster kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002a
Dez 29 15:39:51 kernemonster kernel: RBP: 00007fe83683c0e0 R08: 0000000000000000 R09: 00007fe8376427e0
Dez 29 15:39:51 kernemonster kernel: R10: fffffffffffffff0 R11: 0000000000000293 R12: 0000000000000000
Dez 29 15:39:51 kernemonster kernel: R13: 00007fe8509effc0 R14: 00007fe83683c0a0 R15: 0000000000000000

I installed smartmontools and did a short test with this result:

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.4.6-arch3-1] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2311         -
# 2  Vendor (0xff)       Completed without error       00%      2306         -
# 3  Vendor (0xff)       Completed without error       00%      2305         -
# 4  Vendor (0xff)       Completed without error       00%      2304         -
# 5  Vendor (0xff)       Completed without error       00%      2302         -
# 6  Vendor (0xff)       Completed without error       00%      2301         -
# 7  Vendor (0xff)       Completed without error       00%      2298         -
# 8  Vendor (0xff)       Completed without error       00%      2297         -
# 9  Vendor (0xff)       Completed without error       00%      2292         -
#10  Vendor (0xff)       Completed without error       00%      2290         -
#11  Vendor (0xff)       Completed without error       00%      2288         -
#12  Vendor (0xff)       Completed without error       00%      2287         -
#13  Vendor (0xff)       Completed without error       00%      2285         -
#14  Vendor (0xff)       Completed without error       00%      2284         -
#15  Vendor (0xff)       Completed without error       00%      2283         -
#16  Vendor (0xff)       Completed without error       00%      2280         -
#17  Vendor (0xff)       Completed without error       00%      2277         -
#18  Vendor (0xff)       Completed without error       00%      2276         -
#19  Vendor (0xff)       Completed without error       00%      2275         -
#20  Vendor (0xff)       Completed without error       00%      2274         -
#21  Vendor (0xff)       Completed without error       00%      2272         -

The disk is partition according to the WIKI part "LUKS on a partition".

Some bug reports mention a problem with 4.18 and Ryzen boards. I also have a Ryzen 1700 but run the current kernel (5.4.6-arch3-1)

Might this be related to Steam's way of allocating/manging the disk space? Or even worse a problem with the partitions? How can I track this down?

Thanks in advance

Offline

#2 2019-12-30 06:04:06

mxfm
Member
Registered: 2015-10-23
Posts: 163

Re: Call trace related to ext4

Some time ago I also experienced issues (system is unusable for 1-2 minutes) when writing large files (more than 700MiB-1GiB) and journal log contained traces related to jdb2 (like yours  jbd2_log_wait_commit+0xc7/0x130 [jbd2]). In my case setup was complicated by lvm on top of luks, so debugging this issue was hard and I didn't reach anything.

Some bug reports mention a problem with 4.18 and Ryzen boards. I also have a Ryzen 1700 but run the current kernel (5.4.6-arch3-1)

Might this be related to Steam's way of allocating/manging the disk space? Or even worse a problem with the partitions? How can I track this down?

I don't think it is related with hardware. You can try to copy around sufficiently large files (if your filesystem is not COW)to test whether this bug occurs only with Steam.

I installed smartmontools and did a short test with this result:

Short test or long test?

Offline

#3 2020-10-23 14:46:38

two_penguins
Member
Registered: 2019-12-29
Posts: 13

Re: Call trace related to ext4

Again I started to analyze the issue a little bit. In the last test I copied a 40Gb+ file over NFS and while it started to transfer at 90Mb/s after approx. 15Gb the rate decreased to 50-60Mb/s and after around 30Gb the Desktop freeze came back for a few seconds. The log during the copy process looks like this:

Okt 23 14:35:02 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:35:02 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:35:02 kernemonster tracker-extract[4217]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:35:02 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:35:02 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:35:04 kernemonster systemd[1357]: Started Application launched by gnome-shell.
Okt 23 14:35:05 kernemonster evolution-alarm[4233]: Your application did not unregister from D-Bus before destruction. Consider using g_application_run().
Okt 23 14:35:13 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:35:31 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0010d.
Okt 23 14:35:31 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c00114.
Okt 23 14:35:32 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0011b.
Okt 23 14:35:34 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c00114.
Okt 23 14:35:35 kernemonster gnome-shell[2328]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1c0010a specified for 0x1c0011b.
Okt 23 14:35:40 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:35:40 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:36:30 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:36:30 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:36:30 kernemonster tracker-extract[4448]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:36:30 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:36:30 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:36:41 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:37:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:37:11 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:37:11 kernemonster tracker-extract[4474]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:37:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:37:11 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:37:22 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:37:22 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:37:22 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:37:23 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:37:23 kernemonster unknown[4487]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files disappears >
Okt 23 14:37:23 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:37:33 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:39:43 kernemonster systemd[1357]: Started VTE child process 4557 launched by gnome-terminal-server process 2741.
Okt 23 14:39:47 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:39:47 kernemonster rtkit-daemon[1297]: Supervising 3 threads of 2 processes of 1 users.
Okt 23 14:39:59 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:39:59 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:39:59 kernemonster tracker-extract[4627]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:39:59 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:39:59 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:40:10 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:40:10 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:40:11 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:40:11 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:40:11 kernemonster unknown[4638]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files disappears >
Okt 23 14:40:11 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:40:21 kernemonster systemd[1357]: tracker-extract-3.service: Succeeded.
Okt 23 14:41:43 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' r>
Okt 23 14:41:43 kernemonster systemd[1357]: Starting Tracker metadata extractor...
Okt 23 14:41:43 kernemonster tracker-extract[4677]: tracker-extract-3 running as org.freedesktop.Tracker3.Miner.Extract. The service will exit when org.freedesktop.Tracker3.Miner.Files dis>
Okt 23 14:41:43 kernemonster dbus-daemon[2022]: [session uid=1000 pid=2022] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Okt 23 14:41:43 kernemonster systemd[1357]: Started Tracker metadata extractor.
Okt 23 14:41:44 kernemonster systemd[1357]: vte-spawn-6e1b176d-8a38-4f46-b1df-6f9055c1859c.scope: Succeeded.
Okt 23 14:41:44 kernemonster audit[2757]: USER_END pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_unix acct="root" exe="/usr/bin/su" hostname=kernemonster addr=? ter>
Okt 23 14:41:44 kernemonster kernel: audit: type=1106 audit(1603456904.469:180): pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_unix acct="root" exe="/usr/bin/su" ho>
Okt 23 14:41:44 kernemonster kernel: audit: type=1104 audit(1603456904.469:181): pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/bin/su" hostna>
Okt 23 14:41:44 kernemonster kernel: audit: type=1106 audit(1603456904.469:182): pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root>
Okt 23 14:41:44 kernemonster kernel: audit: type=1104 audit(1603456904.469:183): pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_u2f acct="root" exe="/usr/bin/sudo" hostnam>
Okt 23 14:41:44 kernemonster audit[2757]: CRED_DISP pid=2757 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/bin/su" hostname=kernemonster addr=? termin>
Okt 23 14:41:44 kernemonster audit[2754]: USER_END pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname>
Okt 23 14:41:44 kernemonster audit[2754]: CRED_DISP pid=2754 uid=0 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_u2f acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/>
Okt 23 14:41:44 kernemonster su[2757]: pam_unix(su:session): session closed for user root
Okt 23 14:41:44 kernemonster systemd[1357]: vte-spawn-ab4ca689-29ac-4821-8c16-6dffa7b7aadf.scope: Succeeded.

This time the call trace didn't show up (but the freezes did). It doesn't seem to be related to Steam (like assumed in the beginning) as it wasn't involved this time. The same machine didn't have issues when copying the file from Windows 10 over SMB.

The output from smartctl -t long:

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.1-arch1-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
Device Model:     Crucial_CT480M500SSD1
Serial Number:    1351095EAB0F
LU WWN Device Id: 5 00a075 1095eab0f
Firmware Version: MU05
User Capacity:    480.103.981.056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Oct 23 16:45:24 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 2235) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  37) minutes.
Conveyance self-test routine
recommended polling time: 	 (   3) minutes.
SCT capabilities: 	       (0x0035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       1
  5 Reallocate_NAND_Blk_Cnt 0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2427
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2227
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   098   098   000    Old_age   Always       -       75
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       222
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       8219
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   071   043   000    Old_age   Always       -       29 (Min/Max 0/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0031   098   098   000    Pre-fail  Offline      -       2
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   ---    Old_age   Always       -       13559275604
247 Host_Program_Page_Count 0x0032   100   100   ---    Old_age   Always       -       427303352
248 FTL_Program_Page_Count  0x0032   100   100   ---    Old_age   Always       -       1934431862

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2427         -
# 2  Vendor (0xff)       Completed without error       00%      2425         -
# 3  Vendor (0xff)       Completed without error       00%      2422         -
# 4  Vendor (0xff)       Completed without error       00%      2416         -
# 5  Vendor (0xff)       Completed without error       00%      2414         -
# 6  Vendor (0xff)       Completed without error       00%      2413         -
# 7  Vendor (0xff)       Completed without error       00%      2404         -
# 8  Vendor (0xff)       Completed without error       00%      2401         -
# 9  Vendor (0xff)       Completed without error       00%      2400         -
#10  Vendor (0xff)       Completed without error       00%      2398         -
#11  Vendor (0xff)       Completed without error       00%      2396         -
#12  Vendor (0xff)       Completed without error       00%      2394         -
#13  Vendor (0xff)       Completed without error       00%      2390         -
#14  Vendor (0xff)       Completed without error       00%      2387         -
#15  Vendor (0xff)       Completed without error       00%      2386         -
#16  Vendor (0xff)       Completed without error       00%      2382         -
#17  Vendor (0xff)       Completed without error       00%      2379         -
#18  Vendor (0xff)       Completed without error       00%      2377         -
#19  Vendor (0xff)       Completed without error       00%      2366         -
#20  Vendor (0xff)       Completed without error       00%      2353         -
#21  Vendor (0xff)       Completed without error       00%      2352         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

How could I analyze i.e. if luks is working correctly and not having issues that don't show up in the log?

Offline

Board footer

Powered by FluxBB