You are not logged in.

#1 2015-05-22 18:42:59

jwhendy
Member
Registered: 2010-04-01
Posts: 621

[SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

I'm having issues with re-formatting an external hard drive using dm-crypt. It was previously formatted with TrueCrypt/NTFS, which I used as a shared backup drive between Windows and Arch. At some point, it stopped being able to mount, which I attributed to allowing Windows to "fix" it after improper dismount (e.g. a hard kill).

I decided to re-format with ext4 and only use it from Arch, but now I'm wondering if I may have a hardware issue with the drive. I've tried a lot more (like going through the full zero write after mounting the drive as a temporary dm-crypt device), but here's the condensed version to illustrate the problem.


system info

This is on a fresh boot. Just adding that as I've had issues with kernel modules after updating if a new kernel comes through. A fresh boot removes that potential issue.

$ uname -a
Linux arch_840 4.0.3-1-ARCH #1 SMP PREEMPT Wed May 13 15:38:47 CEST 2015 x86_64 GNU/Linux

$ lsmod | grep dm_
dm_crypt               28672  2 
dm_mod                 98304  5 dm_crypt

$ lsmod |grep xts
xts                    16384  2 serpent_sse2_x86_64,twofish_x86_64_3way
gf128mul               16384  2 lrw,xts

smartctl status

Figured I should check the drive. There's a lot of old age and pre-fail warnings, but this post would seem to suggest I'm okay?

# smartctl -A /dev/sdb
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-4.0.3-1-ARCH] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   090   089   025    Pre-fail  Always       -       3330
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       703
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3707
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       104
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       734
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       17
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   053   000    Old_age   Always       -       24 (Min/Max 16/47)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       3
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       104
225 Load_Cycle_Count        0x0032   079   079   000    Old_age   Always       -       214068

Disk info, delete existing partition, new MBR, create new partition

# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.26.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x76d37b6d

Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1          63 976768064 976768002 465.8G 83 Linux

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): o
Created a new DOS disklabel with disk identifier 0x2cd60f13.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 
First sector (2048-976773167, default 2048): 
Last sector, +sectors or +size{K,M,G,T,P} (2048-976773167, default 976773167): 

Created a new partition 1 of type 'Linux' and of size 465.8 GiB.

Command (m for help): w

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

trying to format with cryptsetup

I had a bunch of custom options, but other Arch posts suggested just trying the default, which is what I've done here. It fails with the same error as when I try to pass a cipher, key size, etc. ("Command failed with code 5: IO error while encrypting keyslot.").

# truecrypt -v --debug luksFormat /dev/sdb1
bash: truecrypt: command not found
[root@arch_840 jwhendy]# cryptsetup -v --debug luksFormat /dev/sdb1
# cryptsetup 1.6.6 processing "cryptsetup -v --debug luksFormat /dev/sdb1"
# Running command luksFormat.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.

WARNING!
========
This will overwrite data on /dev/sdb1 irrevocably.

Are you sure? (Type uppercase yes): YES
# Allocating crypt device /dev/sdb1 context.
# Trying to open and read device /dev/sdb1.
# Initialising device-mapper backend library.
# Timeout set to 0 miliseconds.
# Iteration time set to 1000 miliseconds.
# Interactive passphrase entry requested.
Enter passphrase: 
Verify passphrase: 
# Formatting device /dev/sdb1 as type LUKS1.
# Crypto backend (gcrypt 1.6.3) initialized.
# Detected kernel Linux 4.0.3-1-ARCH x86_64.
# Topology: IO (512/0), offset = 0; Required alignment is 1048576 bytes.
# Checking if cipher aes-xts-plain64 is usable.
# Using userspace crypto wrapper to access keyslot area.
# Generating LUKS header version 1 using hash sha1, aes, xts-plain64, MK 32 bytes
# KDF pbkdf2, hash sha1: 996745 iterations per second.
# Data offset 4096, UUID 181fed4d-42f2-4f0f-8b70-cb7ba459e25f, digest iterations 121625
# Updating LUKS header of size 1024 on device /dev/sdb1
# Key length 32, device size 976771120 sectors, header size 2050 sectors.
# Reading LUKS header of size 1024 from device /dev/sdb1
# Key length 32, device size 976771120 sectors, header size 2050 sectors.
# Adding new keyslot -1 using volume key.
# Calculating data for key slot 0
# KDF pbkdf2, hash sha1: 1008246 iterations per second.
# Key slot 0 use 492307 password iterations.
# Using hash sha1 for AF in key slot 0, 4000 stripes
# Updating key slot 0 [0x1000] area.
# Using userspace crypto wrapper to access keyslot area.
IO error while encrypting keyslot.
# Releasing crypt device /dev/sdb1 context.
# Releasing device-mapper backend.
# Unlocking memory.
Command failed with code 5: IO error while encrypting keyslot.

Things also tend to hang with respect to the drive at this point. For example, fdisk -l spits out /dev/sda partitions immediately and then just hangs instead of printing out /dev/sdb info, then eventually quits (without ever writing it).

Any suggestions on where to look/how to troubleshoot? I found some possibly related posts, but nothing that looks promising:
- Impossible to crypt the drive using cryptsetup (fixed by rebooting)
- cryptsetup fails to open Udev cookie 0xd4d94f5 (semid 0) waiting for z (no responses; the hang after seems similar)

There's a couple odds and ends references to cryptsetup 1.6.6 having issues. I downloaded 1.6.4-1 and 1.6.5-1 and -2 from ARM to try, but wanted to post this in the meantime in case something stuck out.

Last edited by jwhendy (2015-05-29 16:01:40)

Offline

#2 2015-05-25 12:05:22

qinohe
Member
From: Netherlands
Registered: 2012-06-20
Posts: 1,596

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

Had a very similar situation lately, improper dismount, tried everything you did, to no success.
The strange thing was, I could format it to ext4 and mount the disk. I would then write a few files to it and it would stop. After a check I found the disk changed from read/write to read only. I tried all suggestions found, no luck.
Also that I/O 5 error you have suggests it can't read or write to that I guess keyslot.
You could try and format the disk ext4, use it and see if you can read and write to it, otherwise it might be bricked.

Offline

#3 2015-05-25 13:48:16

frostschutz
Member
Registered: 2013-11-15
Posts: 1,647

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

anything in dmesg?

does the disk pass a smartctl -t long self-test?

Offline

#4 2015-05-27 04:21:44

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

@qinohe I thought of that and the other day started formatting with mkfs.ext4; unfortunately, it was at work and I had to leave before I could let it finish. It had been chugging along a good few hours, and I was surprised it would take that long. I was able to format it with ext4 using Windows 7 (I dual boot) with the MiniTool Partition Wizard but I didn't use it like that before trying to solve the cryptsetup issue again.

This last time around, I was getting unresponsive behavior. I think I need to reboot each time I try something with cryptsetup, as any commands related to that drive seem to hang afterwards (fdisk, umount, eject, mkfs, or trying crypsetup again). Perhaps I'll just let it cook overnight with mkfs and see if I can at least have an unencrypted, but functional drive.

One interesting tidbit is that even though cryptsetup fails, when I've tried to issue mkfs afterward, it asks me to confirm that I want to format the disk since it has a LUKS header... so something appears to have been written. Is it possible the header is causing some issues? I don't know much about the structure of a disk (like what range the MBR resides in, what constitutes a header, etc.) but have been wondering if there's some way to start really, really clean with the disk. Like I'd just bought it -- something appears to be lingering around from previous efforts?

@frostschutz I'll check tomorrow. That's a good question. Just checked journalctl and here are some of the errors that appear; unfortunately, I wasn't watching so I can't tell you what matches up with what command:

May 23 09:32:22 arch_840 systemd-udevd[7784]: inotify_add_watch(7, /dev/sdb1, 10) failed: No such file or directory

May 23 09:32:22 arch_840 kernel: usb 3-4: stat urb: status -108

### there's lots like this; like 10 in a row with various sector values listed
May 23 09:32:19 arch_840 kernel: Buffer I/O error on dev sdb1, logical block 61341696, lost async page write
May 23 09:32:19 arch_840 kernel: blk_update_request: I/O error, dev sdb, sector 490735616

### there's also a bunch like this, from tab #0 -> #29 (not colored red, so not sure they're errors?)
May 23 09:32:19 arch_840 kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 1d 07 bc 10 00 04 00 00
May 23 09:32:18 arch_840 kernel: sd 2:0:0:0: [sdb] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT 

I paged down quite a ways and those seem like the unique messages when I search the journal for "sdb". Anything stand out? I will say that the same sector numbers appeared in multiple blocks of the third error type listed, so that makes me wonder if something is genuinely wrong with the disk. I'll post the output of the full smartctl scan when I hopefully run it tomorrow.

Thanks for chiming in!

Offline

#5 2015-05-27 09:42:09

qinohe
Member
From: Netherlands
Registered: 2012-06-20
Posts: 1,596

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

jwhendy wrote:

This last time around, I was getting unresponsive behavior. I think I need to reboot each time I try something with cryptsetup, as any commands related to that drive seem to hang afterwards (fdisk, umount, eject, mkfs, or trying crypsetup again). Perhaps I'll just let it cook overnight with mkfs and see if I can at least have an unencrypted, but functional drive.

Let it cook overnight?...., that doesn't seem right, should be done in a matter of moments wink or how old is that machine...

One interesting tidbit is that even though cryptsetup fails, when I've tried to issue mkfs afterward, it asks me to confirm that I want to format the disk since it has a LUKS header... so something appears to have been written. Is it possible the header is causing some issues? I don't know much about the structure of a disk (like what range the MBR resides in, what constitutes a header, etc.) but have been wondering if there's some way to start really, really clean with the disk. Like I'd just bought it -- something appears to be lingering around from previous efforts?

Start clean, with something like

dd if=/dev/zero of=/dev/sdX

Wish you luck with it, but my guess you try to reanimate an already dead disk.

Offline

#6 2015-05-27 12:59:21

frostschutz
Member
Registered: 2013-11-15
Posts: 1,647

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

You should investigate the cause of those I/O errors; no point doing anything else with it, unless you enjoy data loss.

Offline

#7 2015-05-28 19:24:04

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

Thanks to both of you. I googled mkfs.ext4 times and now see it shouldn't be taking that long. I'm running a smartctl selftest right now (estimate of ~2hrs to complete), and will give the zero overwrite a try. I'll post back with the smartctl info when it completes. I'm tending to agree that this may be looking more and more like a dead disk. This has never happened to me before... perhaps I'm just reluctant to admit defeat smile

Offline

#8 2015-05-28 21:19:55

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

Assuming this means it's dying (read failure)?

# smartctl -l selftest /dev/sdb
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-4.0.4-2-ARCH] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%      3714         524232704

Offline

#9 2015-05-29 10:09:13

frostschutz
Member
Registered: 2013-11-15
Posts: 1,647

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

Yup. If you zero the drive completely with dd, mayhap it will reallocate some sectors, but such a drive is no longer trustworthy for important data.

Offline

#10 2015-05-29 16:00:52

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: [SOLVED] errors with fdisk and cryptsetup; is my drive going bad?

Good to know. I guess I'll be getting a new drive. Bummer to have spent so much time assuming I was doing cryptsetup wrong to find out it's this! Just for comparison, I checked a different drive today and get this from the short test:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        88         -

So clearly a difference in the status. Thanks for all the assistance -- will update title (as I think it would be better for this to match hits around drive health and not people looking to solve cryptsetup issues) and mark it solved.

Offline

Board footer

Powered by FluxBB