You are not logged in.
Hey all. Got a 4tb external Toshiba HDD recently that i wanted to use for all the stuff I make. I was using udiskie for a while but got sick of it corrupting every time my pc froze or lost power.
I eventually put it in /etc/fstab and tinkered with it for a night. initially tried using ntfs-3g but fell back on ntfs3 after it had a bunch of read/write errors.
Now every time I try to load some files (be it games, music, whatever) through it, it reboots and remounts under /dev/sdb instead of sda like it normally is.
i've run chkdsk /f under windows too many times to count, it does nothing and usually just tells me all is well.
Here's what my fstab looks like:
# /dev/sda2
UUID=64AED8B5AED880CA /run/media/val/SLAB/ ntfs defaults,nofail,uid=1000,gid=1000,dmask=000,fmask=000,umask-000 0 0
This is what dmesg tells me when it crashes:
[ 41.800769] I/O error, dev sda, sector 142239592 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[ 41.800798] device offline error, dev sda, sector 142239592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 41.800805] Buffer I/O error on dev sda2, logical block 17746925, async page read
[ 41.801022] device offline error, dev sda, sector 142239592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 41.801027] Buffer I/O error on dev sda2, logical block 17746925, async page read
[ 41.801433] device offline error, dev sda, sector 142239656 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 41.801441] device offline error, dev sda, sector 142239656 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 41.801444] Buffer I/O error on dev sda2, logical block 17746933, async page read
[ 41.801736] device offline error, dev sda, sector 142239656 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 41.801740] Buffer I/O error on dev sda2, logical block 17746933, async page read
[ 41.802094] device offline error, dev sda, sector 142239592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 41.802100] Buffer I/O error on dev sda2, logical block 17746925, async page read
[ 41.852538] Buffer I/O error on dev sda2, logical block 815920, async page read
[ 41.852697] Buffer I/O error on dev sda2, logical block 815920, async page read
[ 41.853050] Buffer I/O error on dev sda2, logical block 815920, async page read
I'm at my wits end. This is years of work i fear losing, any help would be appreciated.
edit: did a smartctl test on the drive and it came back with no errors. FWIW i also usually have megasync syncing a folder in this drive, but i don't see how that could cause issues? but feel free to illuminate me
Last edited by levalithan (2024-11-24 13:22:59)
Offline
You may try 'ntfsfix' with -d or -b option, -n to see what would be done but do nothing. Also you may read short manual for ntfsprogs. You may want to do backup of whole partition or most important things before you begin.
Offline
Tried ntfsfix a bunch. it tells me there's nothing wrong with my drive. did it again just to post the output:
[val@tower ~]$ sudo ntfsfix -dn /dev/sda2
Mounting volume... OK
Processing of $MFT and $MFTMirr completed successfully.
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/sda2 was processed successfully.
[val@tower ~]$ sudo ntfsfix -bn /dev/sda2
Mounting volume... OK
Processing of $MFT and $MFTMirr completed successfully.
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/sda2 was processed successfully.
Offline
Now every time I try to load some files (be it games, music, whatever) through it, it reboots and remounts under /dev/sdb instead of sda like it normally is.
Because sda will be taken by something else (or stale)
This is what dmesg tells me when it crashes:
That's not a filesystem error…
Got a 4tb external Toshiba HDD
https://wiki.archlinux.org/title/Power_ … utosuspend
https://wiki.archlinux.org/title/Power_ … Management
And in doubt please post the error in context (ie. the journal of an affected boot)
Offline
Changing the power management / autosuspend settings did nothing. I tried to access a file to upload to firefox and it spat out an "input/output" error and unmounted / remounted the drive a bunch until it stopped.
Offline
in doubt please post the error in context (ie. the journal of an affected boot)
For the current boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
nb. that next to "usbcore.autosuspend=-1" userspace tools like TLP can and will alter the autosuspend settings at runtime and you'll have to disable it there (as well)
What did you change the ALPM to? The wiki is not correct/dated - a lot of devices will default med_power_with_dipm to since a couple of kernels and the relevant setting would be max_performance (to guarantee stable behavior as much as possible, you can step to medium_power from there in case ALPM actually turns out to be the problem)
Offline
Sorry, missed that. Here's the output: http://0x0.st/XnSM.txt
I believe I set the power mode to max_performance, though i'm unsure if it did anything since it's connected via USB instead of SATA. I remember (using the guide from the same wiki page) adding a udev rule that prevents the drive from spinning down or powering off, but that didn't seem to help.
Offline
Nov 19 18:09:32 tower ntfs-3g[421]: ntfs_attr_pread error reading '…doom 2.png' at offset 0: 32768 <> -1: Input/output error
Nov 19 18:09:32 tower ntfs-3g[421]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
Nov 19 18:09:32 tower ntfs-3g[421]: ntfs_attr_pread error reading '…doom 2.png' at offset 0: 4096 <> -1: Input/output error
Nov 19 18:09:32 tower kernel: usb 2-5: USB disconnect, device number 2
Nov 19 18:09:32 tower kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Nov 19 18:09:32 tower kernel: sd 4:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Nov 19 18:09:32 tower kernel: sd 4:0:0:0: [sda] tag#0 CDB: Read(16) 88 00 00 00 00 00 01 e5 74 00 00 00 00 40 00 00
Nov 19 11:43:08 tower (udev-worker)[330]: host4: /etc/udev/rules.d/hd_power_save.rules:1 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:14.0/usb2/2-5/2-5:1.0/host4/scsi_host/host4/link_power_management_policy}="max_performance", ignoring: No such file or directory
Nov 19 11:43:11 tower kernel: scsi host4: usb-storage 2-5:1.0
Nov 19 18:09:32 tower kernel: usb 2-5: USB disconnect, device number 2
Nov 19 18:09:32 tower kernel: scsi host5: usb-storage 2-5:1.0
Nov 19 18:09:45 tower kernel: usb 2-5: USB disconnect, device number 4
Nov 19 18:09:46 tower kernel: scsi host5: usb-storage 2-5:1.0
So the device first fails after 6½h, re-connects immediately, disconnects after 13s and it's just disaster from there on.
"usbcore.autosuspend=-1" isn't in the kernel parameters? What exactly did you configure itr?
Nov 19 11:43:08 tower kernel: usb 2-6: Product: Logitech StreamCam
Just a hunch, can you try w/o the cam?
Nov 19 11:43:15 tower systemd[1]: Starting Hostname Service...
Nov 19 16:29:45 tower systemd[1]: Starting Hostname Service...
Nov 19 18:09:07 tower systemd[1]: Starting Hostname Service...
Nov 19 19:59:54 tower systemd[1]: Starting Hostname Service...
Nov 19 20:45:31 tower systemd[1]: Starting Hostname Service...
Nov 19 20:46:29 tower systemd[1]: Starting Hostname Service...
Any idea why the service gets started over and over again?
18:09:07 is the last logged event before the drive drops out.
Another thing: you seem to be accessing some pictures w/ a steam installation of https://archlinux.org/packages/extra/x86_64/krita/ ?
Is that a constant factor? (And why is it a steam installation)
Offline
"usbcore.autosuspend=-1" is not in kernel params, i have no idea why it would be failing like that beyond maybe anything i might've done with fstab but again, entirely unsure.
Will see if it fails without the cam over the next few days.
No clue why the hostname service would be restarting. I don't think i've done much config work on systemd beyond what's installed with Arch.
Krita is being accessed through Steam because it gets automatic updates through it, and I like being able to log my hours. I'm an artist by profession. It's using it's own native version, no proton or wine with no launch options.
Last edited by levalithan (2024-11-19 21:45:34)
Offline
you seem to be accessing some pictures w/ […] krita ?
Is that a constant factor?
"usbcore.autosuspend=-1" is not in kernel params
The plan would be to add it.
What exactly did you configure [with regards to usb autosuspend]?
Offline
Oops. Krita does not always trigger the crashes and chaos, though it is a factor. It's open usually all-day every day. I'll add that to my kernel params. I tried completely disabling autosuspend using a udev rule, but i'll try just using it in the kernel params and see if that changes anything.
Offline
Krita does not always trigger the crashes and chaos
Then it likely just exposes the situation.
I assume you first used the drive for 6h w/o problems before it started failing?
If it's not the camera, do you have another usb cable? (or did you manually reconnect the driver during/after the failures?)
a udev rule that prevents the drive from spinning down or powering off
Is that still in place? Can you feel when the disk is spinning? Is it spinning when the drive drops out?
Offline
Yeah, i have no idea what causes it to mess up. I read something about how it could be steam (specifically proton) making files with names that would be illegal on windows, but i created a symlink to /home/steam/ to avoid that
the udev rule should still be active, yes. unless it's not working somehow. drive mechanically works fine, though occasionally can hear loud clicks when it powers off / restarts.
Offline
I do not think that the (fuse) filesystem issues trigger the device loss but the other way round.
You could of course use ntfs3 and will probably also have to chkdsk the drive after the many unclean removals.
Maybe the connection only collapses under pressure?
Try to start a couple of parallel dd jobs that read files from that disk into /dev/null (iflag=sync,nocache)
(Read your dd command thrice before hitting enter or mount the ntfs partition read-only
Offline
chkdsk told me there were no errors. used the /f flag like gparted told me to often, but gparted doesn't tell me anything is wrong with it either currently. How would i go about making sure i'm using ntfs3 specifically? I was under the impression that's what I was always using since ntfs-3g had some annoying issues regarding permissions.
Will try to run a dd command, i believe it also may be because it's under pressure. Initially i thought it had something to do with programs not being able to access the disk at the same time (while it's mounted, there is a decently noticeable performance hit, i assumed it was just because the partition was huge (and not formatted to something like ext4)) but again we'll see. Thanks for the help so far.
Offline
update: seems like steam is the major culprit. I opened it after chkdsk-ing it this morning and it immediately caused it to break.
Offline
set my fstab options to this:
UUID=64AED8B5AED880CA /run/media/val/SLAB/ ntfs3 rw,user,exec,nofail,prealloc,windows_names,uid=1000,gid=1000,dmask=000,fmask=000,umask=000 0 0
to no avail. i thought switching to ntfs3 might fix something but steam still has issues writing files somehow.
Offline
A non-steam driven stresstest didn't exhibit this behavior?
You could try to use the repo krita (which I understand is your main steam client?) and see whether it causes the same.
And for clarification:
steam still has issues writing files somehow
This also results in the drive disconnecting?
Offline
It has happened with VLC media player on occasion but steam is the primary culprit. I'm currently trying to set it up with ntfs-3g explicitly using fstab.
Steam is trying to sync cloud files and is spitting this out:
src/clientdll/remotestoragefilesynccontext.cpp (947) : Assertion Failed: Failed to write file after download (2)
using Krita does not seem to cause crashes, just steam itself. I think ntfs-3g would be a better option compared to ntfs3 so again, going to try that now.
Offline
another interesting find, running 'mount' after 'sudo mount -a' after editing the fstab and reloading it: the mount options are completely different? independent of what i wrote in fstab?
here's what's in fstab now:
UUID=64AED8B5AED880CA /run/media/val/SLAB/ ntfs-3g rw,user,exec,async,dev,nofail,prealloc,windows_names,force,uid=1000,gid=1000,umask=000 0 0
here's what's listen in 'mount':
/dev/sda2 on /run/media/val/SLAB type fuseblk (rw,nosuid,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096,user)
edit: apparently that's due to the 'user' option. woops.
edit 2: scratch that. still happening. read that it could be because of windows fast boot but i don't have windows installed on this drive. really odd.
Last edited by levalithan (2024-11-20 15:20:59)
Offline
i don't have windows installed on this drive
The drive is less important than the mainboard and I'll re-iterate that I do not believe that filesystem errors are at the root of this.
You get filesystem errors because you get IO errors because the drive disconnects.
1. try a different cable
2. did you test the behavior when artificially stressing the IO?
Offline
I'll order a new cable and see if that makes a difference, but the factory cable seems to be in good condition, and i can't think of anything that would have happened to ruin it.
I performed a benchmark using GNOME Disks and nothing happened out of the ordinary. Seems to be performing fine.
Offline
progress! steam is now able to successfully perform Cloud Sync without nuking the drive! I'll update if it fails again and how.
Last edited by levalithan (2024-11-20 15:50:48)
Offline
crashed again while trying to access an image to upload to firefox. http://0x0.st/XnOs.txt
Offline
I performed a benchmark using GNOME Disks
That probably won't stress the bus, the idea is to create mulitple parallel operations, not to see how fast the disk can spin.
Offline