You are not logged in.
Hi,
I migrated my single disk ext4 filesystem to a md raid1 (thinkpad w530) using crucial 2TB BX500 disks.
Since the migration I am experiencing random extreme slowdowns from time to time (every few hours). The slowdowns seem related to disk activity. When they start the kernel messages do not show any error whatsoever and neither journalctl.
I use dwm, when the slowdowns start there is no freeze of the UI but anything related to disk activity will experience very slow behavior including internet browsing. an `ls` command takes 2-5 seconds to execute. The slowdown lasts for 10-15 mins then all activity is back to normal.
mdadm -D shows no issue. Same with smartctl on the disks. I tried multiple IO schedulers (none, bfq, md-deadline) as well as process schedulers and kernels. I run Linux-Ck but the problems started on the stable kernel.
I can still send back the disks and was wondering if it's worth to try the raid under BTRFS or just change to an other brand.
What other types of logs/benchmarks can I do to troubleshoot this issue ?
Update: I am getting a 13MB/s avr write speed using kdiskmark
Linux w530 6.8.1-1-ck-generic-v2 #1 SMP PREEMPT_DYNAMIC
/dev/md0:
Version : 1.2
Creation Time : Sat Mar 23 22:43:32 2024
Raid Level : raid1
Array Size : 1953380352 (1862.89 GiB 2000.26 GB)
Used Dev Size : 1953380352 (1862.89 GiB 2000.26 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue May 7 13:18:46 2024
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : any:0
UUID : 69cc98c9:1344ebc1:dddfc34d:1fdccb8c
Events : 44519
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
smartctl disk1 :
Model Family: Crucial/Micron Client SSDs
Device Model: CT2000BX500SSD1
Serial Number: 2342E8816905
LU WWN Device Id: 5 00a075 1e8816905
Firmware Version: M6CR061
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 7 13:17:20 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1193
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 74
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 099 099 000 Old_age Always - 15
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 23
180 Unused_Reserve_NAND_Blk 0x0033 100 100 000 Pre-fail Always - 53
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 062 053 000 Old_age Always - 38 (Min/Max 29/47)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_ECC_Cnt 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 099 099 001 Old_age Offline - 1
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 12620413040
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 394387907
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 527480832
249 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 0
251 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 2986038327
252 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 3
253 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 0
SMART Error Log not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1023 -
Selective Self-tests/Logging not supported
The above only provides legacy SMART information - try 'smartctl -x' for more
smartctl disk2 :
Model Family: Crucial/Micron Client SSDs
Device Model: CT2000BX500SSD1
Serial Number: 2342E881581E
LU WWN Device Id: 5 00a075 1e881581e
Firmware Version: M6CR061
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 7 13:18:05 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1071
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 72
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 099 099 000 Old_age Always - 9
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 20
180 Unused_Reserve_NAND_Blk 0x0033 100 100 000 Pre-fail Always - 43
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 065 044 000 Old_age Always - 35 (Min/Max 27/56)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_ECC_Cnt 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 099 099 001 Old_age Offline - 1
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 6473132044
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 202285376
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 899848192
249 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 0
251 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 2768195272
252 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 5
253 Unkn_CrucialMicron_Attr 0x0032 100 100 000 Old_age Always - 0
SMART Error Log not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 902 -
Selective Self-tests/Logging not supported
The above only provides legacy SMART information - try 'smartctl -x' for more
Last edited by sp4ke (2024-05-07 15:31:38)
Offline