You are not logged in.

#1 2026-02-08 15:33:35

samtk0225
Member
From: Germany, BW
Registered: 2026-02-08
Posts: 4

RAID6 failure,strange values

Hello,

This morning my raid-6 array failed after less than three years.
The computer is only booted up twice a month for about 10 minutes.
I'm just glad I have the same data backed up on a USB hard drive,
and after this, I'm getting a second one.
Whether the raid array can be repaired or not, the array
will no longer be used for the time being.

The raid-6 system :
OS : Arch Linux 6.18.7-arch1-1 x86_64
CPU : Intel Xeon E-2388G
Mainboard : Asus P12R-M-10G-2T
RAM : 16GB DDR4-3200 ECC
Controller : Broadcom HBA 9500-8i
SSD : 8x Crucial SSD MX 400 4TB connected to Broadcom Controller
Power supply : Bequiet Straight Power 12 1000 Watts

The motherboard, controller, and SSDs have the latest firmware.
Apart from the controller,
no other hardware is installed in the computer. (Headless access, SSH, NFS)
The raid-6 volume uses the BTRFS file system,
The mount parameters of the raid volume on the raid system are:
defaults,compress=zstd:1,ssd,discard=async
The mount parameters on the client computers are:
defaults,noauto,users

What I find strange is that all 8 SSDs are present and accessible.
And according to the SMART values, as far as I understand,
they are all error-free. But according to mdadm, something seems to be wrong.
sda1 and sdc1 are completely missing, which can't be right,
because otherwise they wouldn't be recognized by the system or smartmontools at all.
sdg1 and sdh1 are declared as faulty in mdadm.

Or is the controller defective?

lsblk -f
NAME        FSTYPE            FSVER LABEL            UUID
sda
└─sda1      linux_raid_member 1.2   fileserver:RAID6 ...
sdb
└─sdb1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sdc
└─sdc1      linux_raid_member 1.2   fileserver:RAID6 ...
sdd
└─sdd1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sde
└─sde1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sdf
└─sdf1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sdg
└─sdg1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sdh
└─sdh1      linux_raid_member 1.2   fileserver:RAID6 ...
  └─md127
sudo mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Sat Sep 30 18:19:57 2023
        Raid Level : raid6
     Used Dev Size : 18446744073709551615
      Raid Devices : 8
     Total Devices : 6
       Persistence : Superblock is persistent

       Update Time : Sat Feb  7 13:38:02 2026
             State : active, FAILED, Not Started
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 2
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

              Name : fileserver:RAID6  (local to host fileserver)
              UUID : ...
            Events : 4275

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed

       -       8      113        -      faulty   /dev/sdh1
       -       8       97        -      faulty   /dev/sdg1
       -       8       81        5      sync   /dev/sdf1
       -       8       65        4      sync   /dev/sde1
       -       8       49        3      sync   /dev/sdd1
       -       8       17        0      sync   /dev/sdb1
cat /proc/mdstat
Personalities : [raid4] [raid5] [raid6]
md127 : inactive sdf1[5] sdd1[3] sdh1[7](F) sdg1[6](F) sdb1[0] sde1[4]
15627014144 blocks super 1.2
unused devices: <none>
sudo mdadm --detail /dev/sdg1
mdadm: /dev/sdg1 does not appear to be an md device
sudo mdadm --detail /dev/sdh1
mdadm: /dev/sdh1 does not appear to be an md device
sudo smartctl -a /dev/sdg1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.18.7-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT4000MX500SSD1
Serial Number:    ...
LU WWN Device Id: 5 ...
Firmware Version: M3CR046
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.5/6083
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  7 17:42:39 2026 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3265
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       183
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       2
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       156
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       240
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   071   044   000    Old_age   Always       -       29 (Min/Max 0/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       9424242918
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       75709343
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       70653098

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
sudo smartctl -a /dev/sdh1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.18.7-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT4000MX500SSD1
Serial Number:    ...
LU WWN Device Id: 5 ...
Firmware Version: M3CR046
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.5/6083
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  7 17:46:58 2026 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3308
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       183
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       2
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       156
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       273
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   070   046   000    Old_age   Always       -       30 (Min/Max 0/54)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       9517752593
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       76567085
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       71469014

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
sudo smartctl -a /dev/sda1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.18.7-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT4000MX500SSD1
Serial Number:    ...
LU WWN Device Id: 5 ...
Firmware Version: M3CR046
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.5/6083
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  7 17:49:08 2026 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3306
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       184
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       3
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       156
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       259
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   073   050   000    Old_age   Always       -       27 (Min/Max 0/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       12194506750
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       97695686
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       92699913

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
sudo smartctl -a /dev/sdc1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.18.7-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT4000MX500SSD1
Serial Number:    ...
LU WWN Device Id: 5 ...
Firmware Version: M3CR046
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.5/6083
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  7 17:51:17 2026 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3289
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       184
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       3
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       156
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       276
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   074   047   000    Old_age   Always       -       26 (Min/Max 0/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       12195276537
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       97698924
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       94815348

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Offline

#2 2026-02-08 15:36:36

frostschutz
Member
Registered: 2013-11-15
Posts: 1,610

Re: RAID6 failure,strange values

mdadm --examine for all? dmesg of failed assembly (syslogs of failure event if you have them, but I guess they're on the raid)?

Offline

#3 2026-02-08 16:00:40

samtk0225
Member
From: Germany, BW
Registered: 2026-02-08
Posts: 4

Re: RAID6 failure,strange values

frostschutz wrote:

mdadm --examine for all?

sudo mdadm --examine
mdadm: No devices to examine
frostschutz wrote:

dmesg of failed assembly

I haven't tried any recovery or anything else yet.

sudo dmesg | grep "md127"
[    3.863271] md/raid:md127: device sde1 operational as raid disk 4
[    3.863274] md/raid:md127: device sda1 operational as raid disk 1
[    3.863274] md/raid:md127: device sdd1 operational as raid disk 3
[    3.863275] md/raid:md127: device sdb1 operational as raid disk 0
[    3.863275] md/raid:md127: device sdf1 operational as raid disk 5
[    3.863981] md/raid:md127: not enough operational devices (3/8 failed)
[    3.888515] md/raid:md127: failed to run raid set.
[    3.888522] md127: ADD_NEW_DISK not supported
[    4.086511] md127: ADD_NEW_DISK not supported
frostschutz wrote:

(syslogs of failure event if you have them, but I guess they're on the raid)?

The raid is not a boot/system drive.

Offline

#4 2026-02-08 19:43:45

topcat01
Member
Registered: 2019-09-17
Posts: 273

Re: RAID6 failure,strange values

sudo mdadm --examine /dev/sd{a,c}1

Also, the full output of

sudo journalctl -b

would be useful. It does seem like a hardware issue of some sort.

BTW, I think a journal device would be a good idea with this setup. If you crash for any reason, btrfs metadata might get corrupted due to the write hole.

Offline

#5 2026-02-08 19:50:21

topcat01
Member
Registered: 2019-09-17
Posts: 273

Re: RAID6 failure,strange values

In case you need to rebuild from scratch, you might consider just using ZFS RAIDz2, which is rock solid, with all the data integrity goodies. I use BTRFS RAID for Single/0/1, and ZFS for 5/6 (RAIDz). I don't really use mdadm anymore.

Offline

#6 2026-02-09 14:41:55

cryptearth
Member
Registered: 2024-02-03
Posts: 1,965

Re: RAID6 failure,strange values

although i agree with topcat in recommending zfs over md there has to be a reason whs a system that's pretty much a cold vault comes up with multiple drives failing at once all of the sudden
to me the this smells like an issue with the power supply: if this is really all hardware that's in that system the psu is way too overkill
even maxing out with synthetic stress tests you hardly will get over 200w total - if you even get that high
it's likely that the psu is somewhat unstable when only hit with 10% of its rated power - and that unstable power can cause issues with all components
modern switch mode psu are designed for a load between 20% and 80%
my guess what has happened here: due to over spec the psu runs unstable in low power which caused the multi-failure - either throw in a 200w gpu and keep it along the cpu at a constant load to keep the psu above 20% - or scale down the psu to about 500w or even smaller to get a better utilization and thereby stable power output

yes, power issues are often overlooked - the root cause here is something not often zfs could have helped

also: don't rely on a single external drive for your backup - get at least a second one!
and ditch btrfs!

Offline

#7 2026-02-09 16:08:01

sharow
Member
Registered: 2015-03-10
Posts: 5

Re: RAID6 failure,strange values

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       183
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       156

I think these two values indicate something. This shows that 85% of the shutdowns were unexpected.
Will the value increase even when using the motherboard's SATA ports?

Offline

#8 2026-02-09 16:44:39

cryptearth
Member
Registered: 2024-02-03
Posts: 1,965

Re: RAID6 failure,strange values

sharow wrote:

This shows that 85% of the shutdowns were unexpected.

agreed - this, combined with "only comes up twice a month for 10 minutes", sounds like "yup, and when I'm done I just kill the power instead of proper shutdown the system"
something doesn't add up here ...

Offline

#9 2026-02-10 18:14:03

samtk0225
Member
From: Germany, BW
Registered: 2026-02-08
Posts: 4

Re: RAID6 failure,strange values

Before the SSDs, HDDs were used here.
I kept the power supply.
I always shut down the system with sudo shutdown -h now,
and I can't recall a power outage that could have damaged the RAID volume.
It's quite possible that the power supply is oversized,
and that's why the problems are occurring. I never paid attention to that.
I'm going to install the Bequiet System Power 11 450Watts power supply.

I also considered using ZFS back then.
But I didn't because, as far as I understand,
everything becomes more complicated. For example,
updating with sudo pacman -Syu is no longer straightforward,
since you have to manually adjust the kernel and other things
with every update.
Is that still the case?

sudo mdadm --examine /dev/sd{a,b,c,d,e,f,g,h}1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 8db59ff6 - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 92128170 - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 279a1a0b - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 83782fe7 - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 6b107ef0 - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 13:38:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 748acddf - correct
         Events : 4275

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 12:43:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 5965089 - correct
         Events : 4274

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
   
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ...
           Name : fileserver:RAID6  (local to host fileserver)
  Creation Time : Sat Sep 30 18:19:57 2023
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 7813507072 sectors (3.64 TiB 4.00 TB)
     Array Size : 23440521216 KiB (21.83 TiB 24.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : ...

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Feb  7 12:43:02 2026
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : 4604721c - correct
         Events : 4274

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Last edited by samtk0225 (2026-02-10 18:18:31)

Offline

#10 2026-02-10 18:59:40

cryptearth
Member
Registered: 2024-02-03
Posts: 1,965

Re: RAID6 failure,strange values

samtk0225 wrote:

I also considered using ZFS back then.
But I didn't because, as far as I understand,
everything becomes more complicated. For example,
updating with sudo pacman -Syu is no longer straightforward,
since you have to manually adjust the kernel and other things
with every update.
Is that still the case?

have a look at: https://github.com/archzfs/archzfs
be aware: the projects gist is to support latest supported kernel by latest stable zfs - which usually results in "we target linux-lts"
in the past there were a few kernel updates which were safe with an older zfs version by patching in a --experimental flag into the build or DKMS - but lately every kernel update required an update of zfs to support it - hence I dropped it again in my fork
yes, the project had some slack at some point - but currently it's done via github actions which runs once every day
you can also go my route - fork it and build your own packages - it became a bit of a meme in the repo who's the first to open the PR for the next zfs update for the next kernel - but unless it's on a weekend you can expect it's up-to-date within 24h
the official archzfs is a bit defensive (which I actually do support) so they don't support RC builds (which I do on my fork) - but even from ZFS upstream: "the next version is released when it's done" - pretty much the same as for every arch package

TL;DR: if you'Re fine living on the edge with potential data loss - you can use zfs on arch on default kernel - otherwise stick to lts

Offline

#11 2026-02-10 19:47:46

frostschutz
Member
Registered: 2013-11-15
Posts: 1,610

Re: RAID6 failure,strange values

Quite odd, the last two drives sdg/sdh (device role 6/7) somehow fell off the array (event count off by one, update time off, and marked failed by all other drives), but it's RAID 6, so it should still be running with only two drives missing.

Yet in assembly it claims there are 3 failed drives. Not entirely clear to me why that is, either I'm overlooking something in the examine or there's a bug... there was a similar issue recentishly but I don't remember how that went.

What does your mdadm.conf look like? If it's overspecific (listing devices) then it could be part of the issue. Or are these drives perhaps detected late? Those ADD_NEW_DISK messages could be related but I haven't seen that before.

The drives are not even listed at all in that dmesg output. Was there anything else in dmesg? (Don't use | grep, there's a chance of missing some non-matching but related lines.) Also check your journal (if you have it) around Sat Feb 7 12:43:02 2026 what was going on there.

Try mdadm --stop /dev/md127 then --assemble it manually using only the 6 drives that should be good, what does it say (mdadm itself and dmesg)?

If it does not work and you think there was not much write activity around the update time. You can also try your luck with --assemble --force.

Last edited by frostschutz (2026-02-10 19:51:47)

Offline

#12 2026-02-10 20:34:32

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 73,309

Re: RAID6 failure,strange values

It's quite possible that the power supply is oversized,

???
Also, according to the smart data not only were 85% of all power-offs sketchy but also the drives ran ~18h on average per power cycle - not 10 minutes.
A lot of things here don't quite add up.

Offline

#13 2026-02-10 21:46:41

cryptearth
Member
Registered: 2024-02-03
Posts: 1,965

Re: RAID6 failure,strange values

seth wrote:

It's quite possible that the power supply is oversized,

???
Also, according to the smart data not only were 85% of all power-offs sketchy but also the drives ran ~18h on average per power cycle - not 10 minutes.
A lot of things here don't quite add up.

not my words:

samtk0225 wrote:

The computer is only booted up twice a month for about 10 minutes.

as for psu "oversized": as mentioned: modern switch mode PSU have a sweet spot and similar to old linear ones still need at least some load - and according to my knowledge the 80 plus start only at around 20%
so a system with the given inventory doesn't come even close the lower 20% under full load - and hence it's quite possible that the psu is just unstable when the system is idle and maybe just draws less then 100W total
have a look over https://github.com/openzfs/zfs/issues - there're countless of issues which turned out to be caused by some faulty or otherwise misbehaving PSU - hence I just see it fair to point out: 1000W is WAY too oversized for the given inventory - even when it was previously build with regular HDD (in a big cluster one enables either staggered spin up or power up in standby to limit the inrush current and let the drives spin up one by one)
from personal testing for 8 HDDs a PSU of 300W is powerfull enough to survive the inrush current without giving up on switching on - although by maths its quite pushing

Offline

#14 2026-02-10 22:03:45

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 73,309

Re: RAID6 failure,strange values

not my words:

Didn't say so either tongue

Interesting point about the overdimensioned PSU, but the outcome seems overly specific (you'd expect more CPU/RAM related issues?) and wouldn't that render most (optimus) gaming systems unstable that come at 750W but typically idle at mostly 10% of  that?

Offline

#15 2026-02-10 23:47:15

loqs
Member
Registered: 2014-03-06
Posts: 18,796

Re: RAID6 failure,strange values

Have you tried switching a working drive with a failing drive or moving a failing drive to connect directly to the mainboard assuming it has SATA ports?

Offline

#16 2026-02-11 05:58:45

sharow
Member
Registered: 2015-03-10
Posts: 5

Re: RAID6 failure,strange values

I think this is compatibility issue. So the M/B's SATA should work fine.

here is compatibility list of your device:
https://techdocs.broadcom.com/us/en/sto … icron.html

Compatibility lists are there for a reason.
https://www.broadcom.com/support/knowle … ontrollers
This is SAS3018, not SAS3808 but I think it still has similar limitations.

And the manual and the spec never say a word about supporting TRIM.
If it's for cold storage, it may be worth disabling TRIM for a while.

Offline

#17 2026-02-11 14:10:44

samtk0225
Member
From: Germany, BW
Registered: 2026-02-08
Posts: 4

Re: RAID6 failure,strange values

frostschutz wrote:

Quite odd, the last two drives sdg/sdh (device role 6/7) somehow fell off the array (event count off by one, update time off, and marked failed by all other drives), but it's RAID 6, so it should still be running with only two drives missing.

Yet in assembly it claims there are 3 failed drives. Not entirely clear to me why that is, either I'm overlooking something in the examine or there's a bug... there was a similar issue recentishly but I don't remember how that went.

What does your mdadm.conf look like? If it's overspecific (listing devices) then it could be part of the issue. Or are these drives perhaps detected late? Those ADD_NEW_DISK messages could be related but I haven't seen that before.

The drives are not even listed at all in that dmesg output. Was there anything else in dmesg? (Don't use | grep, there's a chance of missing some non-matching but related lines.) Also check your journal (if you have it) around Sat Feb 7 12:43:02 2026 what was going on there.

Try mdadm --stop /dev/md127 then --assemble it manually using only the 6 drives that should be good, what does it say (mdadm itself and dmesg)?

If it does not work and you think there was not much write activity around the update time. You can also try your luck with --assemble --force.

sudo mdadm --assemble /dev/md127
mdadm: /dev/md127 not identified in config file.

I have never edited the /etc/mdadm.conf
and I am not a Linux expert.

# mdadm configuration file
#
# mdadm will function properly without the use of a configuration file,
# but this file is useful for keeping track of arrays and member disks.
# In general, a mdadm.conf file is created, and updated, after arrays
# are created. This is the opposite behavior of /etc/raidtab which is
# created prior to array construction.
#
#
# the config file takes two types of lines:
#
#       DEVICE lines specify a list of devices of where to look for
#         potential member disks
#
#       ARRAY lines specify information about how to identify arrays so
#         so that they can be activated
#
# You can have more than one device line and use wild cards. The first
# example includes SCSI the first partition of SCSI disks /dev/sdb,
# /dev/sdc, /dev/sdd, /dev/sdj, /dev/sdk, and /dev/sdl. The second
# line looks for array slices on IDE disks.
#
#DEVICE /dev/sd[bcdjkl]1
#DEVICE /dev/hda1 /dev/hdb1
#
# If you mount devfs on /dev, then a suitable way to list all devices is:
#DEVICE /dev/discs/*/*
#
# The designation "partitions" will scan all partitions found in
# /proc/partitions
DEVICE partitions

# The AUTO line can control which arrays get assembled by auto-assembly,
# meaing either "mdadm -As" when there are no 'ARRAY' lines in this file,
# or "mdadm --incremental" when the array found is not listed in this file.
# By default, all arrays that are found are assembled.
# If you want to ignore all DDF arrays (maybe they are managed by dmraid),
# and only assemble 1.x arrays if which are marked for 'this' homehost,
# but assemble all others, then use
#AUTO -ddf homehost -1.x +all
#
# ARRAY lines specify an array to assemble and a method of identification.
# Arrays can currently be identified by using a UUID, superblock minor number,
# or a listing of devices.
#
#       super-minor is usually the minor number of the metadevice
#       UUID is the Universally Unique Identifier for the array
# Each can be obtained using
#
#       mdadm -D <md>
#
#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
#
# ARRAY lines can also specify a "spare-group" for each array.  mdadm --monitor
# will then move a spare between arrays in a spare-group if one array has a failed
# drive but no spare
#ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df spare-group=group1
#ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1
#
# When used in --follow (aka --monitor) mode, mdadm needs a
# mail address and/or a program.  This can be given with "mailaddr"
# and "program" lines to that monitoring can be started using
#    mdadm --follow --scan & echo $! > /run/mdadm/mon.pid
# If the lines are not found, mdadm will exit quietly
#MAILADDR root@mydomain.tld
#PROGRAM /usr/sbin/handle-mdadm-events

I think I'll replace the SATA power adapters, they look cheap, I don't like them anymore.

Offline

#18 2026-02-11 15:21:25

frostschutz
Member
Registered: 2013-11-15
Posts: 1,610

Re: RAID6 failure,strange values

Try `mdadm --assemble --scan`, or `mdadm --assemble --run /dev/md127 /dev/sd[abcdef]1`, or finally `mdadm --assemble --force /dev/md127 /dev/sd[abcdefgh]1`.

Needs `mdadm --stop /dev/md127` before each attempt (if it was previously assembled) since you can't assemble an array that's already listed in /proc/mdstat even if it's not running.

samtk0225 wrote:

I think I'll replace the SATA power adapters

If you KNOW there are hardware issues, you should fix those first, sure. It's impossible for me to say remotely with nothing to go on, so not my place to speculate randomly.

However it won't change that metadata has already recorded these drives as failed, so changing cables or whatever, by itself, most likely won't fix things. I'm still curious why assembly fails for you, when it should work with two drives missing, but it seems more like a software issue for now. Also curious how this failure originally played out, but without logs, nobody knows. md does not log these things in its metadata despite >100M unused data offset space... did you check journalctl?

Last edited by frostschutz (2026-02-11 15:28:18)

Offline

#19 2026-02-11 20:13:15

loqs
Member
Registered: 2014-03-06
Posts: 18,796

Re: RAID6 failure,strange values

frostschutz wrote:

did you check journalctl?

Please post the kernel messages for the boot where the assembly failed.

Offline

#20 2026-02-12 08:37:15

cryptearth
Member
Registered: 2024-02-03
Posts: 1,965

Re: RAID6 failure,strange values

seth wrote:

not my words:

Didn't say so either tongue

Interesting point about the overdimensioned PSU, but the outcome seems overly specific (you'd expect more CPU/RAM related issues?) and wouldn't that render most (optimus) gaming systems unstable that come at 750W but typically idle at mostly 10% of  that?

as for unstable power: from the issues I read it seem to affect both "spinning rust" as well as solid state equally - but both are affected by unclean power
solutions range from using an uninterruptable power bank over replacing a psu to just switchibg around connectors (hindsight: as OP now mentioned "cheap looking sata power connectors" it could be a lead)
I'm not an (electrical) engineer - but take a look at that crippled design of the 12v hpwr connector: even just because the connector is a bit off angel can cause bad connections raising temps so high entire computers went up in flame
for harddrive power i connectors i suspect similar issues: loss of spring tension inside the connector causing bad connections with high resistance causing issues during power spikes as also the capacitors on a drive can only hold so much charge
so, yea, for whatever reason power issues seem high  on the ranking for data issue in an array

Offline

Board footer

Powered by FluxBB