You are not logged in.

#1 2015-09-10 11:55:44

millz
Member
Registered: 2013-12-29
Posts: 30

LVM (on Raid and LUKS) behaves weird and don't want to get mounted

Hey Guys,

I'm faced with a huge problem about my RAID1. Maybe you guys know how to solve this issue.

First, I'll give you a short overview about my setup. Afterwards I'll give you a brief error description. At the end, I write down what I've already tried and what doesn't worked.

Setup
My RAID1 is based on two HDDs. Each 4TB in size. Theye are combind as RAID1 via mdadm. Then, the RAID device (/dev/md127) got crypted by using cryptsetup/dm-crypt/luks. Using the luks device as base, I've configured a volume group (vgroup) with 5 logic volumes (lvm) with different sizes. I've a server (odroid) running, where this hdd setup is connected to.

That setup worked properly for almost 2.5y smoothly. Until last week, as something strange happend.

Error
One day last week, my server showed some strange behavior. Therefore, I did some research and I found out, that I've a 100% I/O Rate on my cpu. Checking dmesg showed me some messages, which fed my fear that one of the HDDs is about to crash.

I identified the HDD which was about to crash. Let's call that device /dev/sdc (c like crashed). The other device works just fine (at least it seem to work fine). Let's call that device /dev/sdw (w like working). I shutted down the server, bought a new HDD (4TB), connected the device to the server, started the server again and appended the device to the RAID1 (sdn, n like new).

As I've seen the recovery process working, when concatenating /proc/mdstat. I found myself very happy seeing my issue about to solve quickly. Indeed, that' was my intention when setting up a RAID...

When the recovery got finished, I was about decrypting the RAID device (worked!) and I've seen my volume group with all my logical volumes. But the error story wasn't over... I couldn't mount any of the LVM devices! So this why this issue is titled as LVM issue, having the hope you know some advice to get my data back!

What I did
- I tried to mount any of the lvm devices with any possible filesystem type (ext2, ext3, ext4, f2fs, raiserfs) - afaik, ext4 was used.
- I checked all superblocks (quiet a long list) and tried to e2fsck -f -b <superblock> for all provided superblocks on an example device (which I made a raw dd copy before I used for such tryouts).
- I tried to recover and recreating the superblock (on the raw copy of one example LVM device)

Nothing helped me out of my dilemma. Now i'm almost at the end of my ideas and I hope you can help me. Worth to mention, that I've 3 HDDs (sdc, sdw and sdn). sdw and sdn are working as RAID as mentioned before. sdc still runs (more ore less) and could be mounted on a different computer. Here it's quiet important to mention: the lvm devices on this device are mountable.

Unfortunately, I cannot copy the data properly. I tried to catch a raw copy by using dd on two different LVM (the most important of the existing 5), but after some GB, the dd copy crashes with some I/O errors. Same for rsync backup at some point.

Do you have any clue? Maybe combining the superblock of the sdc's LVMs with the raid's LVMs? Or so?

Thanks for helping me guys!

Cheers,
millz

PS: If you need any further information, indeed, I'll provide.

Last edited by millz (2015-09-10 20:58:09)

Offline

#2 2015-09-10 13:38:28

frostschutz
Member
Registered: 2013-11-15
Posts: 1,417

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

smartctl -a /dev/sd? and mdadm --examine /dev/sd?*

If sdc is the only disk that works but partially broken, you're left with ddrescue to an intact disk and go from there

which commands did you use to rebuild the raid?

Last edited by frostschutz (2015-09-10 13:46:57)

Offline

#3 2015-09-10 21:10:37

millz
Member
Registered: 2013-12-29
Posts: 30

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

As you write /dev/sd I assume you mean /dev/sdc?

Here it is:

# smartctl -a /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.6-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: scsi error unsupported field in scsi command

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
# mdadm --examine /dev/sdc
mdadm: No md superblock detected on /dev/sdc.
[root@host root]# mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 59243c7c:2501e96d:84c4a226:121f2ff0
           Name : someothercomputer:0
  Creation Time : Wed Sep  4 18:15:07 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 7813568128 (3725.80 GiB 4000.55 GB)
     Array Size : 3906783872 (3725.80 GiB 4000.55 GB)
  Used Dev Size : 7813567744 (3725.80 GiB 4000.55 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=384 sectors
          State : active
    Device UUID : b24496e4:bd13b615:ac569122:4dd548ad

    Update Time : Wed Sep  9 23:45:21 2015
       Checksum : 9cac1a5b - correct
         Events : 168


   Device Role : Active device 1
   Array State : .A ('A' == active, '.' == missing, 'R' == replacing)

// edit: adding smartctl -a -T verypermissive /dev/sdc

# smartctl -a -T verypermissive /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.6-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: scsi error unsupported field in scsi command

=== START OF INFORMATION SECTION ===
Device Model:     [No Information Found]
Serial Number:    [No Information Found]
Firmware Version: [No Information Found]
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   [No Information Found]
Local Time is:    Thu Sep 10 23:12:43 2015 CEST
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
                  Checking to be sure by trying SMART RETURN STATUS command.
SMART support is: Unknown - Try option -s with argument 'on' to enable it.
Read SMART Data failed: scsi error unsupported field in scsi command

=== START OF READ SMART DATA SECTION ===
SMART Status command failed: scsi error unsupported field in scsi command
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

Read SMART Error Log failed: scsi error unsupported field in scsi command

Read SMART Self-test Log failed: scsi error unsupported field in scsi command

Selective Self-tests/Logging not supported

Last edited by millz (2015-09-10 21:14:17)

Offline

#4 2015-09-11 15:37:21

millz
Member
Registered: 2013-12-29
Posts: 30

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

Just in case this could be useful, I'll add the same information for /dev/sdw:

# mdadm --examine /dev/sdw1
/dev/sdw1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x8
     Array UUID : 59243c7c:2501e96d:84c4a226:121f2ff0
           Name : someothercomputer:0
  Creation Time : Wed Sep  4 18:15:07 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 7813568128 (3725.80 GiB 4000.55 GB)
     Array Size : 3906783872 (3725.80 GiB 4000.55 GB)
  Used Dev Size : 7813567744 (3725.80 GiB 4000.55 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=384 sectors
          State : clean
    Device UUID : 92568563:dd2be61b:692de8f7:e7f3118c

    Update Time : Mon Sep  7 11:13:08 2015
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : d5eee6cc - correct
         Events : 188


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
# smartctl -a /dev/sdw
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.6-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD40EZRX-00SPEB0
Serial Number:    WD-WCC4E4YSXLYE
LU WWN Device Id: 5 0014ee 20c0da74b
Firmware Version: 80.00A80
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Sep 11 17:35:36 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(54240) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 542) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   180   021    Pre-fail  Always       -       7908
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       35
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       30
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       30
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   127   105   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

In addition, you may need the information if /dev/sdw is mounted as raid (/dev/md127) and decrypted (/dev/mapper/vgroup-lvcontainer)

$ sudo mount /dev/mapper/vgroup-lvcontainer remoteContainer/
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/vgroup-lvcontainer ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler

       Manchmal liefert das Systemprotokoll wertvolle Informationen –
       versuchen Sie  dmesg | tail  oder ähnlich

As this is german, the error says: Wrong filesystemtype, invalid option, the superblock of /dev/mapper/vgroup-lvcontainer is damaged, missing cryptpage(?) or any other error

Sometimes the system protocol provides meaningful information – try dmesg | tail or so

$ dmesg | tail
[  154.511085] sd 8:0:0:0: [sdd] No Caching mode page found
[  154.511090] sd 8:0:0:0: [sdd] Assuming drive cache: write through
[  154.511635] sd 8:0:0:0: [sdd] 976746240 4096-byte logical blocks: (3.90 TB/3.63 TiB)
[  154.654843]  sdd: sdd1
[  154.655313] sd 8:0:0:0: [sdd] 976746240 4096-byte logical blocks: (3.90 TB/3.63 TiB)
[  154.655803] sd 8:0:0:0: [sdd] Attached SCSI disk
[  154.924050] md: bind<sdd1>
[  188.094729] md: raid1 personality registered for level 1
[  188.094963] md/raid1:md127: active with 1 out of 2 mirrors
[  188.094985] md127: detected capacity change from 0 to 4000546684928

Unfortunately, dmesg doesn't say anything about the lvm mounting issue... :-(

Last edited by millz (2015-09-11 15:43:16)

Offline

#5 2015-09-11 16:05:39

frostschutz
Member
Registered: 2013-11-15
Posts: 1,417

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

And what is file -sL /dev/vg/lv?

Is that other disk connected in a strange way? Sometimes you need -d sat or similar if it's usb...

The only thing that seems odd to me is your super low event count for those raids... it could be normal but all of my RAIDs have event counts of several tens of thousands. Is there a chance you were using a disk directly ignoring the RAID layer? In theory it's possible with RAID 1 but ...

I don't see anything else that might suggest why data is missing on one side.

Last edited by frostschutz (2015-09-11 16:08:32)

Offline

#6 2015-09-11 20:23:13

millz
Member
Registered: 2013-12-29
Posts: 30

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

Well, it's usb. Indeed. For all these devices.

What's meant by "-d sat"?

Here the output of file -sL (based on /dev/sdn which should be similar to /dev/sdw):

# file -sL /dev/vgroup/lv*
/dev/vgroup/lvcontainer: data
/dev/vgroup/lvdivmedia:  data
/dev/vgroup/lvmovies:    data
/dev/vgroup/lvmusic:     Linux rev 1.0 ext4 filesystem data, UUID=43b15ad4-8424-4968-8176-b647ec5c60fb, volume name "Music" (extents) (large files) (huge files)
/dev/vgroup/lvtvshows:   data

I should emphasize, that I'm checking the devices separately. So there is just one device connected, when accessing. For each case, I'm accessing the devices via /dev/md127 and not directly via /dev/sd[cwn]. To be precise: I do

# cryptsetup luksOpen /dev/md127 raid

Maybe its useful to show that:

# mount /dev/vgroup/lvmusic remoteMusic/
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/vgroup-lvmusic ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler

       Manchmal liefert das Systemprotokoll wertvolle Informationen –
       versuchen Sie  dmesg | tail  oder ähnlich
[root@host millz]# dmesg | tail
[13302.584684] ...v1.10 Mouse...
[17213.336352] JBD2: no valid journal superblock found
[17213.336355] EXT4-fs (dm-4): error loading journal
[root@host millz]# mount /dev/vgroup/lvcontainer remoteContainer/
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/vgroup-lvcontainer ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler

       Manchmal liefert das Systemprotokoll wertvolle Informationen –
       versuchen Sie  dmesg | tail  oder ähnlich
[root@host millz]# dmesg | tail
[13302.584684] ...v1.10 Mouse...
[17213.336352] JBD2: no valid journal superblock found
[17213.336355] EXT4-fs (dm-4): error loading journal

Last edited by millz (2015-09-11 20:25:10)

Offline

#7 2015-09-13 18:09:52

millz
Member
Registered: 2013-12-29
Posts: 30

Re: LVM (on Raid and LUKS) behaves weird and don't want to get mounted

Update: I mounted /dev/sdc and I used ddrescue. Here's the output

# ddrescue /dev/mapper/vgroup-lvcontainer /mnt/usbhdd/lvcontainer.img
GNU ddrescue 1.19
Press Ctrl-C to interrupt
rescued:   125038 MB,  errsize:   53248 B,  current rate:   46268 kB/s
   ipos:   125039 MB,   errors:       1,    average rate:   40636 kB/s
   opos:   125039 MB, run time:   51.28 m,  successful read:       0 s ago
Copying non-tried blocks... Pass 1 (forwards)

Runs for 6h, but stucking at this point...

Last edited by millz (2015-09-13 19:58:05)

Offline

Board footer

Powered by FluxBB