You are not logged in.

#1 2024-09-29 09:01:35

makh
Member
Registered: 2011-10-10
Posts: 305

Disk health - video file error

Hi,
My hard disk got full... I am getting following matter despite making disk free!

fsck returns ok
partition is ext4 for data only.
system updated about one month ago!

I get this on on playing few video files:

VLC media player 3.0.21 Vetinari (revision 3.0.21-0-gdd8bfdbabe8)
[0000569052e4c520] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.
[00007c070cc2c550] avcodec decoder: Using Intel iHD driver for Intel(R) Gen Graphics - 24.3.2 () for hardware decoding
[00007c070c001720] filesystem stream error: read error: Input/output error
[00007c0708000c90] main input error: ES_OUT_SET_(GROUP_)PCR  is called too late (pts_delay increased to 1000 ms)
[00007c070cc2c550] main decoder error: Timestamp conversion failed for 70500001: no reference clock
[00007c070cc2c550] main decoder error: Could not convert timestamp 0 for FFmpeg
[h264 @ 0x7c070cd42340] Invalid NAL unit size (50320 > 30725).
[h264 @ 0x7c070cd42340] Error splitting the input into NAL units.
[h264 @ 0x7c070cc89a00] co located POCs unavailable
[h264 @ 0x7c070ccd1d40] co located POCs unavailable
[00007c070c001720] filesystem stream error: read error: Input/output error
[00007c070c001720] filesystem stream error: read error: Input/output error
[00007c0708000c90] main input error: ES_OUT_SET_(GROUP_)PCR  is called too late (pts_delay increased to 1126 ms)
[00007c070cc2c550] main decoder error: Timestamp conversion failed for 75500001: no reference clock
[00007c070cc2c550] main decoder error: Could not convert timestamp 0 for FFmpeg
[00007c070c001720] filesystem stream error: read error: Input/output error
[00007c070cc0f840] avcodec decoder: Using Intel iHD driver for Intel(R) Gen Graphics - 24.3.2 () for hardware decoding
smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Is there any tweak possible?

Thank you


OS:  Arch  &/  Debian
System: LENOVO ThinkPad E14
Desktop: Xfce

Offline

#2 2024-09-30 04:45:59

mpan
Member
Registered: 2012-08-01
Posts: 1,597
Website

Re: Disk health - video file error

If we make the assumption that the EIO error⁽¹⁾ comes from a `read` call on a file and is hardware-related,⁽²⁾ it may be any hardware in the I/O path: starting from some controller, through the SATA cable, through the HDD, to voltage regulation. So no need to panic yet.

`smartctl -H` is a passive check, which only indicates that drive is basically dead. It neither runs any tests, nor provides aging-related measurements.

How to approach:
First verify the issue occurs in software other than VLC. Try mpv for example. Very little chances it’s a bug in code, but can’t be completely excluded and that would save you a lot of effort and nerves.

(removed: will not work for VLC)That step is optional, but the second thing to check is to see what syscall falls:

strace -Z /usr/bin/vlc path_to_file

You’re looking for calls failing just before “Input/output error” is reported. If it’s `read(…) = -1`, follow down the list here. If you can’t see any `read` there, please report. Possibly the issue is elsewhere.

Third step: report here the complete output of:

sudo smartctl -A /dev/sda

Please also provide the exact drive model, as reported by:

sudo smartctl -i /dev/sda

Check if there are any errors appearing in the journal around the moment you try to play the file. You may use the -f (--follow) option to follow the output:

sudo journalctl -f

Report those too.

Fourth step: if you feel comfortable with doing things inside your computer, make sure the SATA cable is well connected. They may get loose, causing communication errors.

You may also run full self-test:

sudo smartctl -t long /dev/sda

Note: this will take long time. The test runs on the disk itself and doesn’t interfere with normal system activities, except possibly reducing performance a bit. “Self-test execution status” entry from the following command gives a rough estimation of how much time left until the test is over:

sudo smartctl -c /dev/sda

The log of finished tests is available at:

sudo smartctl -l selftest /dev/sda

But I would leave this until we see the SMART values and other answers to the above.
____
⁽¹⁾ Errno 5, “Input/output error”
⁽²⁾ It may come from other calls and, in particular in such cases, it may not be caused by any low-level malfunction.

Last edited by mpan (2024-10-01 22:21:33)


Paperclips in avatars? | Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#3 2024-09-30 07:35:33

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,815

Re: Disk health - video file error

Try mpv for example. Very little chances it’s a bug in code, but can’t be completely excluded and that would save you a lot of effort and nerves.

Also try

mpv --hwdec=no file.mp4

to rule out some VAAPI related errors and generally whether multiple files are affected or whether you can play "mpv 'https://www.youtube.com/watch?v=YE7VzlLtp-4'" (big buck bunny, requires yt-dlp)

Offline

#4 2024-10-01 10:12:04

makh
Member
Registered: 2011-10-10
Posts: 305

Re: Disk health - video file error

Hi
Thanks for your kind time!
...

I get the following responses:

mpv

gives same error; the file ends to play and ends at 1min and few seconds


sudo journalctl -f

Sep 30 15:34:51 mymachine kernel: sd 0:0:0:0: [sda] tag#20 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 30 15:34:51 mymachine kernel: sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 67 07 39 d0 00 00 08 00
Sep 30 15:34:51 mymachine kernel: I/O error, dev sda, sector 1728526800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Sep 30 15:34:51 mymachine kernel: ata1: EH complete
Sep 30 15:35:34 mymachine systemd[1]: Starting Cleanup of Temporary Directories...
Sep 30 15:35:35 mymachine systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Sep 30 15:35:35 mymachine systemd[1]: Finished Cleanup of Temporary Directories.
Sep 30 15:35:35 mymachine systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Sep 30 15:36:18 mymachine su[3432]: (to supermyuser) myuser on pts/1
Sep 30 15:36:18 mymachine su[3432]: pam_unix(su-l:session): session opened for user supermyuser(uid=1001) by myuser(uid=1000)
Sep 30 15:38:20 mymachine kernel: ata1.00: exception Emask 0x0 SAct 0x20000 SErr 0x40000 action 0x0
Sep 30 15:38:21 mymachine kernel: ata1.00: irq_stat 0x40000008
Sep 30 15:38:21 mymachine kernel: ata1: SError: { CommWake }
Sep 30 15:38:21 mymachine kernel: ata1.00: failed command: READ FPDMA QUEUED
Sep 30 15:38:21 mymachine kernel: ata1.00: cmd 60/08:88:d0:39:07/00:00:67:00:00/40 tag 17 ncq dma 4096 in
                                           res 41/40:00:d0:39:07/00:00:67:00:00/00 Emask 0x409 (media error) <F>
Sep 30 15:38:21 mymachine kernel: ata1.00: status: { DRDY ERR }
Sep 30 15:38:21 mymachine kernel: ata1.00: error: { UNC }
Sep 30 15:38:21 mymachine kernel: ata1.00: configured for UDMA/133
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current] 
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 67 07 39 d0 00 00 08 00
Sep 30 15:38:21 mymachine kernel: I/O error, dev sda, sector 1728526800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Sep 30 15:38:21 mymachine kernel: ata1: EH complete
Sep 30 15:38:21 mymachine kernel: ata1.00: exception Emask 0x0 SAct 0x1000 SErr 0x0 action 0x0
Sep 30 15:38:21 mymachine kernel: ata1.00: irq_stat 0x40000008
Sep 30 15:38:21 mymachine kernel: ata1.00: failed command: READ FPDMA QUEUED
Sep 30 15:38:21 mymachine kernel: ata1.00: cmd 60/08:60:d0:39:07/00:00:67:00:00/40 tag 12 ncq dma 4096 in
                                           res 41/40:00:d0:39:07/00:00:67:00:00/00 Emask 0x409 (media error) <F>
Sep 30 15:38:21 mymachine kernel: ata1.00: status: { DRDY ERR }
Sep 30 15:38:21 mymachine kernel: ata1.00: error: { UNC }
Sep 30 15:38:21 mymachine kernel: ata1.00: configured for UDMA/133
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Medium Error [current] 
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 67 07 39 d0 00 00 08 00
Sep 30 15:38:21 mymachine kernel: I/O error, dev sda, sector 1728526800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Sep 30 15:38:21 mymachine kernel: ata1: EH complete
Sep 30 15:38:21 mymachine kernel: ata1.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
Sep 30 15:38:21 mymachine kernel: ata1.00: irq_stat 0x40000008
Sep 30 15:38:21 mymachine kernel: ata1.00: failed command: READ FPDMA QUEUED
Sep 30 15:38:21 mymachine kernel: ata1.00: cmd 60/08:28:d0:39:07/00:00:67:00:00/40 tag 5 ncq dma 4096 in
                                           res 41/40:00:d0:39:07/00:00:67:00:00/00 Emask 0x409 (media error) <F>
Sep 30 15:38:21 mymachine kernel: ata1.00: status: { DRDY ERR }
Sep 30 15:38:21 mymachine kernel: ata1.00: error: { UNC }
Sep 30 15:38:21 mymachine kernel: ata1.00: configured for UDMA/133
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#5 Sense Key : Medium Error [current] 
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 30 15:38:21 mymachine kernel: sd 0:0:0:0: [sda] tag#5 CDB: Read(10) 28 00 67 07 39 d0 00 00 08 00
Sep 30 15:38:21 mymachine kernel: I/O error, dev sda, sector 1728526800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Sep 30 15:38:21 mymachine kernel: ata1: EH complete
Sep 30 15:38:22 mymachine kernel: ata1.00: exception Emask 0x0 SAct 0x801000 SErr 0x0 action 0x0
Sep 30 15:38:22 mymachine kernel: ata1.00: irq_stat 0x40000008
Sep 30 15:38:22 mymachine kernel: ata1.00: failed command: READ FPDMA QUEUED
Sep 30 15:38:22 mymachine kernel: ata1.00: cmd 60/08:b8:d0:39:07/00:00:67:00:00/40 tag 23 ncq dma 4096 in
                                           res 41/40:00:d0:39:07/00:00:67:00:00/00 Emask 0x409 (media error) <F>
Sep 30 15:38:22 mymachine kernel: ata1.00: status: { DRDY ERR }
Sep 30 15:38:22 mymachine kernel: ata1.00: error: { UNC }
Sep 30 15:38:22 mymachine kernel: ata1.00: configured for UDMA/133
Sep 30 15:38:22 mymachine kernel: sd 0:0:0:0: [sda] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 30 15:38:22 mymachine kernel: sd 0:0:0:0: [sda] tag#23 Sense Key : Medium Error [current] 
Sep 30 15:38:22 mymachine kernel: sd 0:0:0:0: [sda] tag#23 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 30 15:38:22 mymachine kernel: sd 0:0:0:0: [sda] tag#23 CDB: Read(10) 28 00 67 07 39 d0 00 00 08 00
Sep 30 15:38:22 mymachine kernel: I/O error, dev sda, sector 1728526800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Sep 30 15:38:22 mymachine kernel: ata1: EH complete

strace -Z /usr/bin/vlc file.mp4

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
VLC media player 3.0.21 Vetinari (revision 3.0.21-0-gdd8bfdbabe8)
openat(AT_FDCWD, "/usr/lib/glibc-hwcaps/x86-64-v3/liblirc_client.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/glibc-hwcaps/x86-64-v3/", 0x7ffebedbf840, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/glibc-hwcaps/x86-64-v2/liblirc_client.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/glibc-hwcaps/x86-64-v2/", 0x7ffebedbf840, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/liblirc_client.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libnfs.so.14", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libliveMedia.so.112", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libaribb25.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libprojectM.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libgoom2.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libtiger.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libSDL_image-1.2.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en_US.UTF-8/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en_US.utf8/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en_US/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en.UTF-8/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en.utf8/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/../locale/en/LC_MESSAGES/vlc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/myuser/.local/share/vlc/lua/meta/reader", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/vlc/lua/meta/reader", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
futex(0x58b1caf86480, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=0, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
openat(AT_FDCWD, "/usr/lib/vlc/glibc-hwcaps/x86-64-v3/libvlc_pulse.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/vlc/glibc-hwcaps/x86-64-v3/", 0x7ffebedbf770, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/vlc/glibc-hwcaps/x86-64-v2/libvlc_pulse.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/vlc/glibc-hwcaps/x86-64-v2/", 0x7ffebedbf770, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/vlc/libpulse.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/pulseaudio/glibc-hwcaps/x86-64-v3/libpulsecommon-17.0.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/pulseaudio/glibc-hwcaps/x86-64-v3/", 0x7ffebedbf670, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/pulseaudio/glibc-hwcaps/x86-64-v2/libpulsecommon-17.0.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/pulseaudio/glibc-hwcaps/x86-64-v2/", 0x7ffebedbf670, 0) = -1 ENOENT (No such file or directory)
futex(0x7ffebedc08a4, FUTEX_UNLOCK_PI_PRIVATE) = -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/home/myuser/.pulse/client.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/myuser/.config/pulse/client.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/pulse/client.conf.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/pulseaudio.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
mkdir("/run/user/1000/pulse", 0700)     = -1 EEXIST (File exists)
readlink("/run", 0x7ffebedc0440, 1023)  = -1 EINVAL (Invalid argument)
readlink("/run/user", 0x7ffebedc0440, 1023) = -1 EINVAL (Invalid argument)
readlink("/run/user/1000", 0x7ffebedc0440, 1023) = -1 EINVAL (Invalid argument)
readlink("/run/user/1000/pulse", 0x7ffebedc0440, 1023) = -1 EINVAL (Invalid argument)
sendto(4, "W", 1, MSG_NOSIGNAL, NULL, 0) = -1 ENOTSOCK (Socket operation on non-socket)
recvmsg(10, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(10, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(14, 0x58b1cafd8430, 8, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[000058b1caeee520] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
[00007dd690c10d80] avcodec decoder: Using Intel iHD driver for Intel(R) Gen Graphics - 24.3.2 () for hardware decoding
[00007dd690001770] filesystem stream error: read error: Input/output error
[h264 @ 0x7dd690d34700] Invalid NAL unit size (4236 > 3948).
[h264 @ 0x7dd690d34700] Error splitting the input into NAL units.
[00007dd690001770] filesystem stream error: read error: Input/output error
[00007dd69c000c90] main input error: ES_OUT_SET_(GROUP_)PCR  is called too late (pts_delay increased to 1000 ms)
[00007dd690c0dce0] avcodec decoder: Using Intel iHD driver for Intel(R) Gen Graphics - 24.3.2 () for hardware decoding
mkdir("/home/myuser/.local/share/vlc", 0700) = -1 EEXIST (File exists)
+++ exited with 0 +++

smartctl -i /dev/sda

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.49-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile (SMR)
Device Model:     WDC WD10SPZX-08Z10
Serial Number:    WD-WX92AB02AXLX
LU WWN Device Id: 5 0014ee 2136331dc
Firmware Version: 05.01A05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep 30 15:50:29 2024 PKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

smartctl -t long /dev/sda

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.49-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 205 minutes for test to complete.
Test will complete after Mon Sep 30 19:17:23 2024 PKT
Use smartctl -X to abort test.

mpv --hwdec=no file.mp4

File suddenly stopped playing after one minute approx!

smartctl -l selftest /dev/sda

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.49-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      3698         946570616
# 2  Extended offline    Completed: read failure       90%      3693         946570616
# 3  Short offline       Completed: read failure       90%      3684         946570616
# 4  Extended offline    Completed: read failure       90%      3684         946570616

Q: There once we used defrag a disk in windows 2000; is something similar in ext4 to move the damaged hard disk part to mark as unusable ?


Thankyou


OS:  Arch  &/  Debian
System: LENOVO ThinkPad E14
Desktop: Xfce

Offline

#5 2024-10-01 10:23:57

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,060

Re: Disk health - video file error

https://wiki.archlinux.org/title/Badblocks

Start with the non-destructive rw test , it will give a good idea idea how much is damaged .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#6 2024-10-01 11:45:08

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,815

Re: Disk health - video file error

We haven't seen "smartctlc -a /dev/sda" and certainly should take a look at that FIRST.
The non-destructive badblocks run will take ages and write on the disk A LOT, so if it's falling apart you run a risk to terminally destroy data this way!

Offline

#7 2024-10-01 22:39:12

mpan
Member
Registered: 2012-08-01
Posts: 1,597
Website

Re: Disk health - video file error

From what we got it seems there is at least one sector with data completely⁽¹⁾ destroyed. Please provide `sudo smartctl -A /dev/sda` (that’s uppercase -A, not lowercase -a). This will tell, if there are any indications of drive being in a bad shape.

I don’t see a reason to run badblocks. Not yet, at least. Let’s see SMART data first. I’d also stick to read-only test⁽²⁾ in this case. We already know there are sectors with corrupted signal. Not only searching for other kind of errors seems pointless to me, but the read-write test will be hampered by read errors. It’s unable to do test writes to unreadable sectors, effectively causing unreadable sectors to mask issues.

I’ve also realized that the strace invocation, as suggested above, doesn’t deliver useful information. That’s due to how VLC itself operates. Edited the post.
____
⁽¹⁾ HDDs use an encoding resilent against minor errors. Hence multiple possible levels of damage. The signal may be corrupted, but still readable. In this case too much is damaged to read the sector.
⁽²⁾ The default for badblocks, if neither read-write nor destructive test is requested.


Paperclips in avatars? | Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#8 2024-10-02 04:03:28

makh
Member
Registered: 2011-10-10
Posts: 305

Re: Disk health - video file error

Hi

Start with the non-destructive rw test , it will give a good idea idea how much is damaged .

Let me try this!


seth wrote:

We haven't seen "smartctlc -a /dev/sda" and certainly should take a look at that FIRST.
The non-destructive badblocks run will take ages and write on the disk A LOT, so if it's falling apart you run a risk to terminally destroy data this way!



smartctl -a /dev/sda

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.49-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile (SMR)
Device Model:     WDC WD10SPZX-08Z10
Serial Number:    WD-WX92AB02AXLX
LU WWN Device Id: 5 0014ee 2136331dc
Firmware Version: 05.01A05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct  2 08:57:38 2024 PKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(10620) seconds.
Offline data collection
capabilities: 			 (0x71) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 205) minutes.
Conveyance self-test routine
recommended polling time: 	 (   3) minutes.
SCT capabilities: 	       (0x303d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   197   196   051    Pre-fail  Always       -       128
  3 Spin_Up_Time            0x0027   194   186   021    Pre-fail  Always       -       1266
  4 Start_Stop_Count        0x0032   026   026   000    Old_age   Always       -       74819
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       3705
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   090   090   000    Old_age   Always       -       10203
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       130
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       74
193 Load_Cycle_Count        0x0032   143   143   000    Old_age   Always       -       172545
194 Temperature_Celsius     0x0022   104   089   000    Old_age   Always       -       39 (Min/Max 10/54)
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
206 Flying_Height           0x0022   100   000   000    Old_age   Always       -       36
240 Head_Flying_Hours       0x0032   096   096   000    Old_age   Always       -       3238

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      3702         946570616
# 2  Extended offline    Completed: read failure       90%      3698         946570616
# 3  Extended offline    Completed: read failure       90%      3693         946570616
# 4  Short offline       Completed: read failure       90%      3684         946570616
# 5  Extended offline    Completed: read failure       90%      3684         946570616

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

@mpan
for: smartctl -A /dev/sda

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.49-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   197   196   051    Pre-fail  Always       -       128
  3 Spin_Up_Time            0x0027   194   186   021    Pre-fail  Always       -       1266
  4 Start_Stop_Count        0x0032   026   026   000    Old_age   Always       -       74819
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       3705
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   090   090   000    Old_age   Always       -       10203
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       130
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       74
193 Load_Cycle_Count        0x0032   143   143   000    Old_age   Always       -       172545
194 Temperature_Celsius     0x0022   104   089   000    Old_age   Always       -       39 (Min/Max 10/54)
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
206 Flying_Height           0x0022   100   000   000    Old_age   Always       -       36
240 Head_Flying_Hours       0x0032   096   096   000    Old_age   Always       -       3238

Thanks for your kind guidance!


OS:  Arch  &/  Debian
System: LENOVO ThinkPad E14
Desktop: Xfce

Offline

#9 2024-10-02 06:44:27

mpan
Member
Registered: 2012-08-01
Posts: 1,597
Website

Re: Disk health - video file error

While not perfect, IMO the data doesn’t indicate anything horrible happening to the disk. I would assume bit rot.

The HDD didn’t detect any unwriteable sectors yet. That would cause reallocations, but reallocation count (entry #5) is 0. It did detect 7 sectors, which it couldn’t read (entry #197). While concerning, only after the HDD resolves them we can know, if that’s a persistent or transient problem. The action taken by the drive depends on the brand and line. Unless this continues to rise, I’d not panic. All values are on the safe side of manufacturer-defined thresholds.

Some other observations. Start/stop count is large: I assume this is from power saving. Opinions are mixed, regarding how much that impacts the drive in a negative way. There are lines of drives, like WD Green, which were known to start/stop themselves to death. Some other lines see no such reports and power saving is widespread in laptops. The temperature range, which the drive recorded, is also pretty big: from 10 °C to 54 °C. HDDs generally don’t like high temperature variations.


If those values stay stable, I would just continue using the drive. If this is a laptop, a few read errors are expected to happen. The movie should be considered irreversibly damaged: this you can’t help. If you delete the file, file system will reuse the space later. If the drive is fine, it will just be overwritten with no issues. The same applies to wherever other errors are. As always, keep backups of critical data. This is relevant regardless of your HDD condition.


As for badblocks, the default mode is a read-only test. In this mode badblocks tries to read all sectors from the HDD. Other than the time needed, it’s a lightweight test. If there are other unreadable sectors, the drive will detect them. This may cause entry #197 to go up a bit. The read-write test is much heavier. It reads all sectors and then writes them back. It detects unreadable sectors and some⁽¹⁾ unwriteable sectors. However, IMO in your situation it’s unlikely to report the unwriteable sectors. The drive will silently remap them. At best you’ll see reallocations count going up.
____
⁽¹⁾ It can’t detect sectors, that are both unreadable and unwriteable. For reliably detecting unwriteable sectors a destructive write test is needed.

Last edited by mpan (2024-10-02 06:52:02)


Paperclips in avatars? | Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#10 2024-10-02 12:02:14

makh
Member
Registered: 2011-10-10
Posts: 305

Re: Disk health - video file error

mpan wrote:

While not perfect, IMO the data doesn’t indicate anything horrible happening to the disk. I would assume bit rot.

The HDD didn’t detect any unwriteable sectors yet. That would cause reallocations, but reallocation count (entry #5) is 0. It did detect 7 sectors, which it couldn’t read (entry #197). While concerning, only after the HDD resolves them we can know, if that’s a persistent or transient problem. The action taken by the drive depends on the brand and line. Unless this continues to rise, I’d not panic. All values are on the safe side of manufacturer-defined thresholds.

Some other observations. Start/stop count is large: I assume this is from power saving. Opinions are mixed, regarding how much that impacts the drive in a negative way. There are lines of drives, like WD Green, which were known to start/stop themselves to death. Some other lines see no such reports and power saving is widespread in laptops. The temperature range, which the drive recorded, is also pretty big: from 10 °C to 54 °C. HDDs generally don’t like high temperature variations.


If those values stay stable, I would just continue using the drive. If this is a laptop, a few read errors are expected to happen. The movie should be considered irreversibly damaged: this you can’t help. If you delete the file, file system will reuse the space later. If the drive is fine, it will just be overwritten with no issues. The same applies to wherever other errors are. As always, keep backups of critical data. This is relevant regardless of your HDD condition.


As for badblocks, the default mode is a read-only test. In this mode badblocks tries to read all sectors from the HDD. Other than the time needed, it’s a lightweight test. If there are other unreadable sectors, the drive will detect them. This may cause entry #197 to go up a bit. The read-write test is much heavier. It reads all sectors and then writes them back. It detects unreadable sectors and some⁽¹⁾ unwriteable sectors. However, IMO in your situation it’s unlikely to report the unwriteable sectors. The drive will silently remap them. At best you’ll see reallocations count going up.
____
⁽¹⁾ It can’t detect sectors, that are both unreadable and unwriteable. For reliably detecting unwriteable sectors a destructive write test is needed.

Hi,

This is a laptop.

Should I disable tlp for disk?

Can a few mp4 files get damaged while they werent moved in the partition giving flaw?

This is a read-only test? Do I need to unmount first? or is badblock command more suitable?

mkfs.ext4 -cc /dev/sda8

Thank you


OS:  Arch  &/  Debian
System: LENOVO ThinkPad E14
Desktop: Xfce

Offline

#11 2024-10-02 13:00:42

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,815

Re: Disk health - video file error

Please avoid fully quoting previous posts, it serves no purpose but bloats the thread.

tlp is the leas of your concerns right now. You want to know whether you still can trust that disk or whether it's about to fall apart and you're just noticing the first symptoms.
I'd backup all important data (stuff you can't download from the internet), run badblocks in read-only mode, see whether more issues flare up and whether the smart data takes a (more) negative turn, then run badblocks in read-write mode (the nondestructive test will take much longer, but if the disk is ok, you get to keep your system) and check the smart data again.

Can a few mp4 files get damaged while they werent moved in the partition giving flaw?

They're *on* the bad partition, no? They could have gone bad when writing them or randomly cover degrading sectors.

Offline

#12 2024-10-03 01:23:07

mpan
Member
Registered: 2012-08-01
Posts: 1,597
Website

Re: Disk health - video file error

makh wrote:

Should I disable tlp for disk?

It’s up to you. You decide, what’s your preferred balance between multiple factors. In the simplest form, you decide between your belief in power cycling reducing HDD’s life and how useful your laptop is as a portable device (powered from a battery).⁽¹⁾

makh wrote:

Can a few mp4 files get damaged while they werent moved in the partition giving flaw?

Yes. And simple bit rot may be the culprit. This is why backups of critical data are important. You don’t need a malfunction, a catastrophic failure, or a disaster to lose your data. You may fall victim to plain physics.

makh wrote:

This is a read-only test? Do I need to unmount first? or is badblock command more suitable?

mkfs.ext4 -cc /dev/sda8

NO! This will create a new file system on the partition.

badblocks (of core/e2fsprogs package) is the tool you want to use.

To run read-only (always safe)⁽²⁾ test, invoke badblocks with no options. You may add `-s` to see progress. Also, preferably run it on the entire HDD, not just a single partition:

sudo badblocks -s /dev/sda

Since you use a laptop, consider letting it lay down for the entire process and be well ventilated. After all it’s going to be hours of HDD constantly working.

The read-write test (`-n` option) and write-only test (`-w` option) are both writing to the disk. The latter (write-only) is always destructive: data will be lost. Safety of the former (read-write) test is more convoluted. You must not run it on a mounted filesystem or while anything else writes to the tested disk/partition.⁽³⁾ If that is observed, read-write test on a generally healthy(!) drive is safe. However, it still does writes, so on a failing drive it may introduce new errors to otherwise intact data.
____
⁽¹⁾ In reality there is too many factors to think about it. So choosing between the two I mentioned above is IMO enough. In reality what must be added to the equation is battery wearing out, stress to the power regulators, poor cooling, and excess heat. Also the effect of turning off HDDs in a laptop is even more complicated in laptops than it is in stationary devices. While generally there is a notion it reduces life span, in a portable device there are also vibrations and movement. Both of which powering down a HDD may counter, potentially having a positive net effect. And so far I have seen zero reliable sources giving an answer to that.
⁽²⁾ Not more dangerous than having the HDD running and doing reads for a few hours.
⁽³⁾ The read-write operation of badblocks is not atomic. So it may happen, that the sequence of I/O operations is going to be: badblocks-read X, something-write Y, badblocks-write X. Leading to badblocks destroying data, which something else has written.


Paperclips in avatars? | Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

Board footer

Powered by FluxBB