You are not logged in.

#1 2015-02-07 11:43:06

Grus
Member
Registered: 2011-12-31
Posts: 21

RAID array gone after reboot, mdadm skipping every drive because busy

Yesterday I shut down my PC normally, but when I started it up today the boot failed. I have a RAID5 array (/dev/md127)mounted as /data in /etc/fstab, and on boot it waits 1m30s for the service mounting /data to complete, then fails and drops me in an emergency shell. After commenting out the /data line in /etc/fstab, I can boot normally, but I can't seem to diagnose the array at all. I assembled it on a previous installation, on this installation I haven't done anything to it; it was always automatically in /dev/md127, and /dev/md127p1 (which is mounted on /data) too. This time, only /dev/md127 is there, and when I try to assemble the array it tells me every drive is "busy":

/home/grus  sudo mdadm --examine --scan
ARRAY /dev/md/data  metadata=1.2 UUID=0f8c1f2c:69356c09:b74f75f6:0d321be4 name=mastermind:data

/home/grus  sudo mdadm --assemble --scan
mdadm: No arrays found in config file or automatically

/home/grus  sudo mdadm --assemble /dev/md127 /dev/sdb /dev/sdd /dev/sde /dev/sdf
mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: /dev/sde is busy - skipping
mdadm: /dev/sdf is busy - skipping

/home/grus  sudo mdadm --assemble --scan -v                                     
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/sdc1
mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: /dev/sdf is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sde is busy - skipping
mdadm: no recogniseable superblock on /dev/sda2
mdadm: no recogniseable superblock on /dev/sda1
mdadm: Cannot assemble mbr metadata on /dev/sda
mdadm: No arrays found in config file or automatically

Why is it doing that suddenly? None of these drives are used by anything, there's nothing on there other than RAID stuff. I tried closing every process that wasn't system-critical, but it still says that, and I have no idea what it might be in use by.

/home/grus  cat /proc/mdstat
Personalities : 
md127 : inactive sdb[0](S) sdd[1](S) sdf[4](S) sde[2](S)
      11720542048 blocks super 1.2
       
unused devices: <none>


I really don't know how to handle this problem, or what caused it. I just can't figure out why that would happen suddenly, and why it reports that all my drives are busy. Did all my HDDs fail?

Offline

#2 2015-02-07 11:46:45

frostschutz
Member
Registered: 2013-11-15
Posts: 1,421

Re: RAID array gone after reboot, mdadm skipping every drive because busy

It's busy because there already is the md127 with these disks.

Post the output of mdadm --examine /dev/sd*

Offline

#3 2015-02-07 11:53:12

Grus
Member
Registered: 2011-12-31
Posts: 21

Re: RAID array gone after reboot, mdadm skipping every drive because busy

You're right, thank you so much!

/home/grus  sudo mdadm --stop /dev/md127
mdadm: stopped /dev/md127
/home/grus  sudo mdadm --assemble /dev/md127 /dev/sdb /dev/sdd /dev/sde /dev/sdf -f
mdadm: Marking array /dev/md127 as 'clean'
mdadm: /dev/md127 has been started with 3 drives (out of 4).

/home/grus  cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md127 : active raid5 sdb[0] sdf[4] sdd[1]
      8790405120 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      
unused devices: <none>

So now I have it assembled again, which is nice. But why is it only starting with 3 out of  4 drives?

/home/grus  sudo mdadm --examine /dev/sd*
/dev/sda:
   MBR Magic : aa55
Partition[0] :     16777216 sectors at         2048 (type 83)
Partition[1] :    106312656 sectors at     16779264 (type 83)
mdadm: No md superblock detected on /dev/sda1.
mdadm: No md superblock detected on /dev/sda2.
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0f8c1f2c:69356c09:b74f75f6:0d321be4
           Name : mastermind:data  (local to host mastermind)
  Creation Time : Tue Jun 18 01:40:44 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=944 sectors
          State : clean
    Device UUID : 68d4c801:e72efab7:ba11f635:ae0eff67

    Update Time : Sat Feb  7 02:05:29 2015
       Checksum : b46fb5d3 - correct
         Events : 1962

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   3907026944 sectors at         2048 (type 07)
mdadm: No md superblock detected on /dev/sdc1.
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0f8c1f2c:69356c09:b74f75f6:0d321be4
           Name : mastermind:data  (local to host mastermind)
  Creation Time : Tue Jun 18 01:40:44 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=944 sectors
          State : active
    Device UUID : fccfa934:25450a96:3627bd58:f84590c0

    Update Time : Sat Feb  7 02:05:29 2015
       Checksum : 40b9146d - correct
         Events : 1962

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0f8c1f2c:69356c09:b74f75f6:0d321be4
           Name : mastermind:data  (local to host mastermind)
  Creation Time : Tue Jun 18 01:40:44 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=944 sectors
          State : clean
    Device UUID : e792a639:3a59d3b6:e90e8308:25953c62

    Update Time : Mon Feb  2 13:57:41 2015
       Checksum : b7ea2acc - correct
         Events : 185

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0f8c1f2c:69356c09:b74f75f6:0d321be4
           Name : mastermind:data  (local to host mastermind)
  Creation Time : Tue Jun 18 01:40:44 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=944 sectors
          State : active
    Device UUID : 285d5db8:a5e92c3c:552b52a1:3cd97d89

    Update Time : Sat Feb  7 02:05:29 2015
       Checksum : 7c11dd7f - correct
         Events : 1962

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)

The Array State is AA.A on every functioning drive on the array, but on /dev/sde - the skipped drive - it's AAAA. I'm assuming that means it's the third drive that's missing, which it is, but why?

What I also don't get is - the array was already assembled after boot, but somehow not assembled enough so I could access /dev/md127p1, or even have it show up? Why did I need to stop and re-assemble it, why did normal assembly not work during boot? It couldn't be busy then.

Alright, it boots normally after having stopped and then re-assembled it once after boot. I really don't understand. But it boots fine and has the array all assembled and /dev/md127p1 showing up too. But it's still being assembled without /dev/sde, which I don't get. I'm not really sure how I would check on that, I could check the SMART data, but that wouldn't tell me how mdadm feels about the drive. I can't really read much from --examine, either, what am I supposed to be looking for? It says the Device Role for /dev/sde is Active device 2, but it's not active, and it it's device 2 then what is device 3? /proc/mdstat says "md127 : active raid5 sdb[0] sdd[1] sdf[4]", wouldn't that imply that both 2 and 3 are missing? Only I don't have 0-4 (5 total) drives, only 4. It also mentions "Events" for every drive under --examine, is there any way to look at these "events"? I keep going through the documentation, both the man page and the wiki, but I can't find anything about how to proceed.

Last edited by Grus (2015-02-07 12:08:32)

Offline

#4 2015-02-07 12:28:11

frostschutz
Member
Registered: 2013-11-15
Posts: 1,421

Re: RAID array gone after reboot, mdadm skipping every drive because busy

Grus wrote:

I'm assuming that means it's the third drive that's missing, which it is, but why?

Well, it failed for some reason. Did you check smartctl -a for the disks? If it has reallocated or pending sectors, you should get a replacement. Otherwise just re-add it and hope it will sync.

Why did I need to stop and re-assemble it

Hard to tell. If you did not reboot in between, or root is not on the raid, dmesg and/or the syslog may be interesting...

The failure occured on Mon Feb  2 13:57:41 2015, that's the update time of your /dev/sde. Check your system logs what happened around that time.

Last edited by frostschutz (2015-02-07 12:28:43)

Offline

#5 2015-02-07 12:43:44

Grus
Member
Registered: 2011-12-31
Posts: 21

Re: RAID array gone after reboot, mdadm skipping every drive because busy

Thank you so much for your help.  SMART seems fine to me, some values are pre-fail but there aren't any errors reported, I'm running a longer test now. The failure on Monday is peculiar, I've used it without problems since then, this is the first boot this week with any problems. I'm not really sure which logs to check - dmesg doesn't have any weird messages in it, and I'm not sure where the other logs are located. / is on an SSD, the array is just random data. I'm not sure which log was presented to me in the emergency console, but it couldn't boot because it couldn't mount the local filesystem, which I'm guessing it couldn't because /dev/md127 in the /etc/fstab couldn't be mounted. I don't really see why that would keep root from being mounted, but booting worked fine as soon as the array was commented out of fstab.

/var/log  ls
journal/
old/
samba/
btmp
btmp.1
faillog
lastlog
pacman.log
wtmp
Xorg.0.log

Last edited by Grus (2015-02-07 12:47:13)

Offline

#6 2015-02-07 12:47:17

frostschutz
Member
Registered: 2013-11-15
Posts: 1,421

Re: RAID array gone after reboot, mdadm skipping every drive because busy

You'll probably need `journalctl` to display/filter the journal/ stuff

Offline

Board footer

Powered by FluxBB