fsck: Inode xxxxxxx seems to contain garbage at boot.

cryptoluks · 2018-11-28 17:01:03

Hi,

My setup is as follows:
- Thinkpad X250
- Samsung 860 Pro 512GB
- 4.19.4-arch1-1-ARCH

sda                                   8:0    0   477G  0 disk  
├─sda1                                8:1    0   500M  0 part  /boot
└─sda2                                8:2    0 476,5G  0 part  
  └─root                            254:0    0 476,5G  0 crypt 
    ├─volgrp_linux-rootvol 254:1    0    70G  0 lvm   /
    └─volgrp_linux-homevol 254:2    0 406,5G  0 lvm   /home

Today I got at boot the third time in the past three months the fsck error that several Inodes are containing garbage. Every time I ran then fsck manually and accepted the default option to "clear inode". Thus there were always some files missing on root partition which I solved dirty by reinstalling the affected package.

https://i.imgur.com/ZT7AhXA.jpg

Some observations:

- I recently switched from an Intel SSD to this new Samsung SSD. Never had this problem before in the past ~5 years using Arch
- the shutdown just before each time of the fsck error was clean
- each time I reinstalled the whole laptop (new luks and filesystem), then it worked about a month without issues
- each time it were exactly 16 inodes which "contained garbage"
- smartctl -a /dev/sda reports no CRC or any other errors IMHO (see below)
- I ran a long smart test, which passed also
- the recovered files in /lost+found were intact (checksum)

[root@arch]# smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.4-arch1-1-ARCH] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 860 PRO 512GB
Serial Number:    XXX
LU WWN Device Id: XXX
Firmware Version: RVM01B6Q
User Capacity:    512.110.190.592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Nov 28 17:27:17 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1002
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       813
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       5
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   071   051   000    Old_age   Always       -       29
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       32
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       4373500161

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1000         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I spent already several hours searching in the web, but all threads regarding the "inode seems to contain garabge" seemed mostly related to people using a Raspberry Pi with a low quality SD card.

So my questions are:
1) can this kind of corruptions happen due to unclean shutdowns or is this more related to a faulty SSD or connection?
2) which part of the filesystem got corrupted? as the files found in /lost+found were intact (checksum) is this possible a superblock corruption?
3) what steps can be done to delimit the cause?

Please let me know if I can provide more information.

Thank you for reading.

Edit: Found some journalctl output regarding the Inodes above

Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396883: comm fd: bad extra_isize 15020 (inode size 256)
Nov 27 linux kernel: EXT4-fs error (device dm-1): ext4_iget:4831: inode #396886: comm fd: bad extra_isize 28454 (inode size 256)

Is this maybe related to https://lkml.org/lkml/2018/11/28/152?

Edit: typo

Last edited by cryptoluks (2018-12-06 19:51:50)

ilsensine · 2018-11-29 06:43:21

Yes, it could be related:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
There is an obscure ext4 fs corruption on 4.19 kernels currently worked on. It is still unclear if ths is a bug in the ext4 driver itself or due to something else. I personally suspect the latter.
In the meantime, I suggest you to switch to the lts kernel waiting for the developers to investigate further.

V1del · 2018-11-29 08:45:50

FWIW samsung SSDs are quite notorious for issues with SATA link power management, if you search for it you will find quite a few threads on this here or otherwise on the internet, try to check behaviour by explicitly switching to max_performance: https://wiki.archlinux.org/index.php/Po … Management

cryptoluks · 2018-11-29 13:15:11

V1del wrote:

FWIW samsung SSDs are quite notorious for issues with SATA link power management, if you search for it you will find quite a few threads on this here or otherwise on the internet, try to check behaviour by explicitly switching to max_performance: https://wiki.archlinux.org/index.php/Po … Management

Thank you for the suggestion.

As stated the Wiki, data loss should not occur with Kernel 4.15 onwards with the new power setting "med_power_with_dipm". Nevertheless I will give max_performance a try.

cryptoluks · 2018-11-29 13:25:00

ilsensine wrote:

Yes, it could be related:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
There is an obscure ext4 fs corruption on 4.19 kernels currently worked on. It is still unclear if ths is a bug in the ext4 driver itself or due to something else. I personally suspect the latter.
In the meantime, I suggest you to switch to the lts kernel waiting for the developers to investigate further.

Edit:

Ok I checked on which date the first time this kind of issue appeared. It was on 7th Sep 2018. Checking Arch Linux Archive I was probably using Kernel 4.18.1, which should probably not be affected by this specific ext4 corruption issue you linked above?! ("maybe one of 4.18.18 4.19.1 4.20-rc2")

Last edited by cryptoluks (2018-11-29 13:42:54)

velusip · 2018-12-06 02:43:58

Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring. Have a look at comments 269 and 276 to determine if you are of the lucky few.

https://bugzilla.kernel.org/show_bug.cgi?id=201685#c269
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276

I guess the bug only occurs under very rare conditions and very specific (high) loads?

edit: a word

Last edited by velusip (2018-12-06 02:44:30)

loqs · 2018-12-06 11:40:55

velusip wrote:

Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring.

That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.

cryptoluks · 2018-12-06 16:42:27

velusip wrote:

Discussion of the patch says that the problem may have existed long before, but recent performance updates increased the likeliness of it occurring. Have a look at comments 269 and 276 to determine if you are of the lucky few.
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c269
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276

cat /sys/block/sda/queue/scheduler

outputs

[mq-deadline] kyber bfq none

velusip wrote:

I guess the bug only occurs under very rare conditions and very specific (high) loads?

I suspect the effects of corruption did not hit me immediately. I remember, the day before the third corruption I started Intellij IDEA, which caused my computer to freeze completely. Maybe then something was triggered.

loqs wrote:

That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.

Either that bug was really introduced with 4.19-rc1 and I have a faulty SSD or the bug was there long before 4.19-rc1 as it is also indicated by this comment.

In the meantime I switched to BTRFS to be a bit more resilient against filesystem corruption. I did also update to kernel 4.19.7.arch1-1 which should already include the patch.

Edit: fixed name of last quote

Last edited by cryptoluks (2018-12-06 16:43:03)

loqs · 2018-12-06 17:31:06

cryptoluks wrote:

loqs wrote:
That issue was introduced by 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8 in v4.19-rc1 according to the fix.
Either that bug was really introduced with 4.19-rc1 and I have a faulty SSD or the bug was there long before 4.19-rc1 as it is also indicated by this comment.
In the meantime I switched to BTRFS to be a bit more resilient against filesystem corruption. I did also update to kernel 4.19.7.arch1-1 which should already include the patch.

The dissenting voice on how long the bug has been present is the author the commit blamed by the git bisection as the cause.
My understanding is that all filesystems were vulnerable to the bug the only two that detected the corruption were ext4 and zfs.
Edit:

[mq-deadline] kyber bfq none

Only the none queue was supposed to be affected as well https://bugzilla.kernel.org/show_bug.cgi?id=201685#c276 and pre 4.19 the arch kernel was not using MQ for scsi_mod which covers SATA and SCSI devices.

Last edited by loqs (2018-12-06 17:54:52)

cryptoluks · 2018-12-06 18:18:39

loqs wrote:

My understanding is that all filesystems were vulnerable to the bug the only two that detected the corruption were ext4 and zfs.

Yes. Most people reporting the issue are using ext4. Most linux users in general are probably using ext4. So I don't think there is a causal connection between using ext4 and the corruption.

Are you indicating btrfs maybe can't detect this kind of corruption? btrfs can detect even single bit flips, so I assume the filesystem would recognize it.

V1del · 2018-12-06 18:30:15

FWIW just to somewhat reinforce my suggestion, there was this 2 page thread, with multiple users reporting corruption after the 4.16 kernel (at which point med_power_with_dipm was enabled by default for laptops so while that mode should be better in general, there still seem to be issues specifically with samsung drives.

cryptoluks · 2018-12-06 19:48:47

V1del wrote:

FWIW just to somewhat reinforce my suggestion, there was this 2 page thread, with multiple users reporting corruption after the 4.16 kernel (at which point med_power_with_dipm was enabled by default for laptops so while that mode should be better in general, there still seem to be issues specifically with samsung drives.

Thank you for linking this post. I have kind of a déjà vu while reading this.

kernel: perf: interrupt took too long (3960 > 3911), lowering kernel.perf_event_max_sample_rate to 50400

- I saw exactly such warnings too, but didn't spend much attention to them. Currently with 4.19-7 I am not able to see those warnings in dmesg output - even with sata power management set to default tlp "med_power_with_dipm".

- I had also some freezes after resuming from sleep. Not as often as the people in the post above, but about 1-2 times a week.

If it is true that the kernel bug was only present since 4.19-rc1 then maybe this is the solution.

Thanks at all for all the suggestions.

Edit: typo

Last edited by cryptoluks (2018-12-06 19:50:43)

loqs · 2018-12-06 20:02:12

Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.

cryptoluks · 2018-12-06 20:50:48

loqs wrote:

Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.

Ok, thanks for clearing this up. That said, for kernel 4.19-rc1 upwards it was enough to use mq-deadline or one of the other i/o schedulers as explained here to trigger the bug? (given that enough load for both i/o and cpu is produced)

Then more again, it is probably "just" the sata power management thing.

loqs · 2018-12-06 21:46:26

cryptoluks wrote:

loqs wrote:
Even if the bug was present in 4.18 for you to trigger it you would need to run a kernel with the option scsi_mod.use_blk_mq=1 and with a udev rule that changed the scheduler to none for that device.
Ok, thanks for clearing this up. That said, for kernel 4.19-rc1 upwards it was enough to use mq-deadline or one of the other i/o schedulers as explained here to trigger the bug? (given that enough load for both i/o and cpu is produced)

Which is then contradicted by this which is from the block maintainer but does not give reasoning as to why using a scheduler mitigated the issue.
The slightly longer patch https://git.kernel.org/pub/scm/linux/ke … b57f15c821 might cover the reasoning with a scheduler present
From 4.19.6 without fix

static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
						struct request *rq,
						blk_qc_t *cookie,
						bool bypass_insert)
{
	struct request_queue *q = rq->q;
	bool run_queue = true;

	/*
	 * RCU or SRCU read lock is needed before checking quiesced flag.
	 *
	 * When queue is stopped or quiesced, ignore 'bypass_insert' from
	 * blk_mq_request_issue_directly(), and return BLK_STS_OK to caller,
	 * and avoid driver to try to dispatch again.
	 */
	if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) {
		run_queue = false;
		bypass_insert = false;
		goto insert;
	}

	if (q->elevator && !bypass_insert)
		goto insert;

	if (!blk_mq_get_dispatch_budget(hctx))
		goto insert;

	if (!blk_mq_get_driver_tag(rq)) {
		blk_mq_put_dispatch_budget(hctx);
		goto insert;
	}

	return __blk_mq_issue_directly(hctx, rq, cookie);
insert:
	if (bypass_insert)
		return BLK_STS_RESOURCE;

	blk_mq_sched_insert_request(rq, false, run_queue, false);
	return BLK_STS_OK;
}

The following if will be true with the use of any scheduler so __blk_mq_issue_directly would never be called unless bypass_insert is true

	if (q->elevator && !bypass_insert)
		goto insert;

and __blk_mq_issue_directly is the function containing the bug so if that function is never called the bug can not occur.

static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
					    struct request *rq,
					    blk_qc_t *cookie)
{
	struct request_queue *q = rq->q;
	struct blk_mq_queue_data bd = {
		.rq = rq,
		.last = true,
	};
	blk_qc_t new_cookie;
	blk_status_t ret;

	new_cookie = request_to_qc_t(hctx, rq);

	/*
	 * For OK queue, we are done. For error, caller may kill it.
	 * Any other error (busy), just add it to our list as we
	 * previously would have done.
	 */
	ret = q->mq_ops->queue_rq(hctx, &bd);
	switch (ret) {
	case BLK_STS_OK:
		blk_mq_update_dispatch_busy(hctx, false);
		*cookie = new_cookie;
		break;
	case BLK_STS_RESOURCE:
	case BLK_STS_DEV_RESOURCE:
		blk_mq_update_dispatch_busy(hctx, true);
		__blk_mq_requeue_request(rq);
		break;
	default:
		blk_mq_update_dispatch_busy(hctx, false);
		*cookie = BLK_QC_T_NONE;
		break;
	}

	return ret;
}

Edit:
covering the bypass insert case

void blk_mq_sched_insert_requests(struct request_queue *q,
				  struct blk_mq_ctx *ctx,
				  struct list_head *list, bool run_queue_async)
{
	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
	struct elevator_queue *e = hctx->queue->elevator;

	if (e && e->type->ops.mq.insert_requests)
		e->type->ops.mq.insert_requests(hctx, list, false);
	else {
		/*
		 * try to issue requests directly if the hw queue isn't
		 * busy in case of 'none' scheduler, and this way may save
		 * us one extra enqueue & dequeue to sw queue.
		 */
		if (!hctx->dispatch_busy && !e && !run_queue_async) {
			blk_mq_try_issue_list_directly(hctx, list);
			if (list_empty(list))
				return;
		}
		blk_mq_insert_requests(hctx, ctx, list);
	}

	blk_mq_run_hw_queue(hctx, run_queue_async);
}

Edit2:
https://elixir.bootlin.com/linux/v4.19. … t_directly shows the only callers for blk_mq_try_issue_list_directly.

Last edited by loqs (2018-12-06 22:01:04)

cryptoluks · 2018-12-20 16:24:57

I switched to max performance and did not experienced any of my issues again so far.

Thanks at all for your constructive hints, tips and very detailed explanations. :-) You rock!

Arch Linux

#1 2018-11-28 17:01:03

fsck: Inode xxxxxxx seems to contain garbage at boot.

#2 2018-11-29 06:43:21

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#3 2018-11-29 08:45:50

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#4 2018-11-29 13:15:11

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#5 2018-11-29 13:25:00

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#6 2018-12-06 02:43:58

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#7 2018-12-06 11:40:55

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#8 2018-12-06 16:42:27

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#9 2018-12-06 17:31:06

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#10 2018-12-06 18:18:39

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#11 2018-12-06 18:30:15

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#12 2018-12-06 19:48:47

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#13 2018-12-06 20:02:12

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#14 2018-12-06 20:50:48

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#15 2018-12-06 21:46:26

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

#16 2018-12-20 16:24:57

Re: fsck: Inode xxxxxxx seems to contain garbage at boot.

Board footer