You are not logged in.
Hi,
My setup is as follows:
- Thinkpad X250
- Samsung 860 Pro 512GB
- 4.19.4-arch1-1-ARCH
sda 8:0 0 477G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 476,5G 0 part
└─root 254:0 0 476,5G 0 crypt
├─volgrp_linux-rootvol 254:1 0 70G 0 lvm /
└─volgrp_linux-homevol 254:2 0 406,5G 0 lvm /home
Today I got at boot the third time in the past three months the fsck error that several Inodes are containing garbage. Every time I ran then fsck manually and accepted the default option to "clear inode". Thus there were always some files missing on root partition which I solved dirty by reinstalling the affected package.
https://i.imgur.com/ZT7AhXA.jpg
Some observations:
- I recently switched from an Intel SSD to this new Samsung SSD. Never had this problem before in the past ~5 years using Arch
- the shutdown just before each time of the fsck error was clean
- each time I reinstalled the whole laptop (new luks and filesystem), then it worked about a month without issues
- each time it were exactly 16 inodes which "contained garbage"
- smartctl -a /dev/sda reports no CRC or any other errors IMHO (see below)
- I ran a long smart test, which passed also
- the recovered files in /lost+found were intact (checksum)
[root@arch]# smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.4-arch1-1-ARCH] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 860 PRO 512GB
Serial Number: XXX
LU WWN Device Id: XXX
Firmware Version: RVM01B6Q
User Capacity: 512.110.190.592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 28 17:27:17 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 85) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1002
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 813
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 5
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 071 051 000 Old_age Always - 29
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 32
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 4373500161
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1000 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I spent already several hours searching in the web, but all threads regarding the "inode seems to contain garabge" seemed mostly related to people using a Raspberry Pi with a low quality SD card.
So my questions are:
1) can this kind of corruptions happen due to unclean shutdowns or is this more related to a faulty SSD or connection?
2) which part of the filesystem got corrupted? as the files found in /lost+found were intact (checksum) is this possible a superblock corruption?
3) what steps can be done to delimit the cause?
Please let me know if I can provide more information.
Thank you for reading.
Edit: Found some journalctl output regarding the Inodes above
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Is this maybe related to https://lkml.org/lkml/2018/11/28/152?
Edit: typo
Last edited by cryptoluks (2018-12-06 19:51:50)
Offline
Yes, it could be related:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
There is an obscure ext4 fs corruption on 4.19 kernels currently worked on. It is still unclear if ths is a bug in the ext4 driver itself or due to something else. I personally suspect the latter.
In the meantime, I suggest you to switch to the lts kernel waiting for the developers to investigate further.
Offline
FWIW samsung SSDs are quite notorious for issues with SATA link power management, if you search for it you will find quite a few threads on this here or otherwise on the internet, try to check behaviour by explicitly switching to max_performance: https://wiki.archlinux.org/index.php/Po … Management
Offline
FWIW samsung SSDs are quite notorious for issues with SATA link power management, if you search for it you will find quite a few threads on this here or otherwise on the internet, try to check behaviour by explicitly switching to max_performance: https://wiki.archlinux.org/index.php/Po … Management
Thank you for the suggestion.
As stated the Wiki, data loss should not occur with Kernel 4.15 onwards with the new power setting "med_power_with_dipm". Nevertheless I will give max_performance a try.
Offline
Yes, it could be related:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
There is an obscure ext4 fs corruption on 4.19 kernels currently worked on. It is still unclear if ths is a bug in the ext4 driver itself or due to something else. I personally suspect the latter.
In the meantime, I suggest you to switch to the lts kernel waiting for the developers to investigate further.
Edit:
Ok I checked on which date the first time this kind of issue appeared. It was on 7th Sep 2018. Checking Arch Linux Archive I was probably using Kernel 4.18.1, which should probably not be affected by this specific ext4 corruption issue you linked above?! ("maybe one of 4.18.18 4.19.1 4.20-rc2")
Last edited by cryptoluks (2018-11-29 13:42:54)
Offline
Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring. Have a look at comments 269 and 276 to determine if you are of the lucky few.
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c269
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276
I guess the bug only occurs under very rare conditions and very specific (high) loads?
edit: a word
Last edited by velusip (2018-12-06 02:44:30)
Offline
Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring.
That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.
Offline
Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring. Have a look at comments 269 and 276 to determine if you are of the lucky few.
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c269
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276
cat /sys/block/sda/queue/scheduler
outputs
[mq-deadline] kyber bfq none
I guess the bug only occurs under very rare conditions and very specific (high) loads?
I suspect the effects of corruption did not hit me immediately. I remember, the day before the third corruption I started Intellij IDEA, which caused my computer to freeze completely. Maybe then something was triggered.
That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.
Either that bug was really introduced with 4.19-rc1 and I have a faulty SSD or the bug was there long before 4.19-rc1 as it is also indicated by this comment.
In the meantime I switched to BTRFS to be a bit more resilient against filesystem corruption. I did also update to kernel 4.19.7.arch1-1 which should already include the patch.
Edit: fixed name of last quote
Last edited by cryptoluks (2018-12-06 16:43:03)
Offline
loqs wrote:That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.
Either that bug was really introduced with 4.19-rc1 and I have a faulty SSD or the bug was there long before 4.19-rc1 as it is also indicated by this comment.
In the meantime I switched to BTRFS to be a bit more resilient against filesystem corruption. I did also update to kernel 4.19.7.arch1-1 which should already include the patch.
The dissenting voice on how long the bug has been present is the author the commit blamed by the git bisection as the cause.
My understanding is that all filesystems were vulnerable to the bug the only two that detected the corruption were ext4 and zfs.
Edit:
[mq-deadline] kyber bfq none
Only the none queue was supposed to be affected as well https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276 and pre 4.19 the arch kernel was not using MQ for scsi_mod which covers SATA and SCSI devices.
Last edited by loqs (2018-12-06 17:54:52)
Offline
My understanding is that all filesystems were vulnerable to the bug the only two that detected the corruption were ext4 and zfs.
Yes. Most people reporting the issue are using ext4. Most linux users in general are probably using ext4. So I don't think there is a causal connection between using ext4 and the corruption.
Are you indicating btrfs maybe can't detect this kind of corruption? btrfs can detect even single bit flips, so I assume the filesystem would recognize it.
Offline
FWIW just to somewhat reinforce my suggestion, there was this 2 page thread, with multiple users reporting corruption after the 4.16 kernel (at which point med_power_with_dipm was enabled by default for laptops so while that mode should be better in general, there still seem to be issues specifically with samsung drives.
Offline
FWIW just to somewhat reinforce my suggestion, there was this 2 page thread, with multiple users reporting corruption after the 4.16 kernel (at which point med_power_with_dipm was enabled by default for laptops so while that mode should be better in general, there still seem to be issues specifically with samsung drives.
Thank you for linking this post. I have kind of a déjà vu while reading this.
kernel: perf: interrupt took too long (3960 > 3911), lowering kernel.perf_event_max_sample_rate to 50400
- I saw exactly such warnings too, but didn't spend much attention to them. Currently with 4.19-7 I am not able to see those warnings in dmesg output - even with sata power management set to default tlp "med_power_with_dipm".
- I had also some freezes after resuming from sleep. Not as often as the people in the post above, but about 1-2 times a week.
If it is true that the kernel bug was only present since 4.19-rc1 then maybe this is the solution.
Thanks at all for all the suggestions.
Edit: typo
Last edited by cryptoluks (2018-12-06 19:50:43)
Offline
Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.
Offline
Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.
Ok, thanks for clearing this up. That said, for kernel 4.19-rc1 upwards it was enough to use mq-deadline or one of the other i/o schedulers as explained here to trigger the bug? (given that enough load for both i/o and cpu is produced)
Then more again, it is probably "just" the sata power management thing.
Offline
loqs wrote:Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.
Ok, thanks for clearing this up. That said, for kernel 4.19-rc1 upwards it was enough to use mq-deadline or one of the other i/o schedulers as explained here to trigger the bug? (given that enough load for both i/o and cpu is produced)
Which is then contradicted by this which is from the block maintainer but does not give reasoning as to why using a scheduler mitigated the issue.
The slightly longer patch https://git.kernel.org/pub/scm/linux/ke … b57f15c821 might cover the reasoning with a scheduler present
From 4.19.6 without fix
static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
struct request *rq,
blk_qc_t *cookie,
bool bypass_insert)
{
struct request_queue *q = rq->q;
bool run_queue = true;
/*
* RCU or SRCU read lock is needed before checking quiesced flag.
*
* When queue is stopped or quiesced, ignore 'bypass_insert' from
* blk_mq_request_issue_directly(), and return BLK_STS_OK to caller,
* and avoid driver to try to dispatch again.
*/
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) {
run_queue = false;
bypass_insert = false;
goto insert;
}
if (q->elevator && !bypass_insert)
goto insert;
if (!blk_mq_get_dispatch_budget(hctx))
goto insert;
if (!blk_mq_get_driver_tag(rq)) {
blk_mq_put_dispatch_budget(hctx);
goto insert;
}
return __blk_mq_issue_directly(hctx, rq, cookie);
insert:
if (bypass_insert)
return BLK_STS_RESOURCE;
blk_mq_sched_insert_request(rq, false, run_queue, false);
return BLK_STS_OK;
}
The following if will be true with the use of any scheduler so __blk_mq_issue_directly would never be called unless bypass_insert is true
if (q->elevator && !bypass_insert)
goto insert;
and __blk_mq_issue_directly is the function containing the bug so if that function is never called the bug can not occur.
static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
struct request *rq,
blk_qc_t *cookie)
{
struct request_queue *q = rq->q;
struct blk_mq_queue_data bd = {
.rq = rq,
.last = true,
};
blk_qc_t new_cookie;
blk_status_t ret;
new_cookie = request_to_qc_t(hctx, rq);
/*
* For OK queue, we are done. For error, caller may kill it.
* Any other error (busy), just add it to our list as we
* previously would have done.
*/
ret = q->mq_ops->queue_rq(hctx, &bd);
switch (ret) {
case BLK_STS_OK:
blk_mq_update_dispatch_busy(hctx, false);
*cookie = new_cookie;
break;
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_update_dispatch_busy(hctx, true);
__blk_mq_requeue_request(rq);
break;
default:
blk_mq_update_dispatch_busy(hctx, false);
*cookie = BLK_QC_T_NONE;
break;
}
return ret;
}
Edit:
covering the bypass insert case
void blk_mq_sched_insert_requests(struct request_queue *q,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async)
{
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct elevator_queue *e = hctx->queue->elevator;
if (e && e->type->ops.mq.insert_requests)
e->type->ops.mq.insert_requests(hctx, list, false);
else {
/*
* try to issue requests directly if the hw queue isn't
* busy in case of 'none' scheduler, and this way may save
* us one extra enqueue & dequeue to sw queue.
*/
if (!hctx->dispatch_busy && !e && !run_queue_async) {
blk_mq_try_issue_list_directly(hctx, list);
if (list_empty(list))
return;
}
blk_mq_insert_requests(hctx, ctx, list);
}
blk_mq_run_hw_queue(hctx, run_queue_async);
}
Edit2:
https://elixir.bootlin.com/linux/v4.19. … t_directly shows the only callers for blk_mq_try_issue_list_directly.
Last edited by loqs (2018-12-06 22:01:04)
Offline
I switched to max performance and did not experienced any of my issues again so far.
Thanks at all for your constructive hints, tips and very detailed explanations. :-) You rock!
Offline