You are not logged in.

#1 2015-05-21 19:22:39

lhog
Member
Registered: 2014-01-21
Posts: 9

HDD IO related system freeze/crash

Hi!

I'm getting system freeze/crash whenever more or less significant load is placed on the IO system of my hard drive.

Console is spammed with lots of messages like this:

May 21 21:28:00 arch-PC kernel: BUG: Bad page state in process pacman  pfn:398502
May 21 21:28:00 arch-PC kernel: page:ffffea000e614080 count:12582912 mapcount:0 mapping:          (null) index:0x2
May 21 21:28:00 arch-PC kernel: flags: 0x2fffe0000000000()
May 21 21:28:00 arch-PC kernel: page dumped because: nonzero _count
May 21 21:28:00 arch-PC kernel: Modules linked in: nls_utf8 fuse btrfs xor raid6_pq ufs hfsplus hfs minix ntfs vfat msdos fat jfs ext4 crc16 mbcache jbd2 dm_m
May 21 21:28:01 arch-PC kernel:  drm_kms_helper ttm drm i2c_core
May 21 21:28:01 arch-PC kernel: CPU: 7 PID: 1242 Comm: pacman Not tainted 4.0.4-1-ARCH #1
May 21 21:28:01 arch-PC kernel: Hardware name: System manufacturer System Product Name/P7P55D PRO, BIOS 1002    11/13/2009
May 21 21:28:01 arch-PC kernel:  0000000000000000 00000000c6ea1ed7 ffff88042a5838f8 ffffffff81571f43
May 21 21:28:01 arch-PC kernel:  0000000000000000 ffffea000e614080 ffff88042a583928 ffffffff811636dc
May 21 21:28:01 arch-PC kernel:  ffffffff8117e2f0 ffff88043fcf78f8 0000000000000246 ffff88042a583ad8
May 21 21:28:01 arch-PC kernel: Call Trace:
May 21 21:28:01 arch-PC kernel:  [<ffffffff81571f43>] dump_stack+0x4c/0x6e
May 21 21:28:01 arch-PC kernel:  [<ffffffff811636dc>] bad_page.part.12+0xbc/0x110
May 21 21:28:01 arch-PC kernel:  [<ffffffff8117e2f0>] ? zone_statistics+0x80/0xa0
May 21 21:28:01 arch-PC kernel:  [<ffffffff8116733d>] get_page_from_freelist+0x5bd/0x9d0
May 21 21:28:01 arch-PC kernel:  [<ffffffff811679cc>] __alloc_pages_nodemask+0x17c/0x9f0
May 21 21:28:01 arch-PC kernel:  [<ffffffff812c4a62>] ? radix_tree_lookup_slot+0x22/0x50
May 21 21:28:01 arch-PC kernel:  [<ffffffff8115e12a>] ? find_get_entry+0x6a/0xd0
May 21 21:28:01 arch-PC kernel:  [<ffffffff811aed91>] alloc_pages_current+0x91/0x110
May 21 21:28:01 arch-PC kernel:  [<ffffffff8115e887>] __page_cache_alloc+0xa7/0xd0
May 21 21:28:01 arch-PC kernel:  [<ffffffff8115e942>] pagecache_get_page+0x92/0x1f0
May 21 21:28:01 arch-PC kernel:  [<ffffffff8115ecca>] grab_cache_page_write_begin+0x2a/0x50
May 21 21:28:01 arch-PC kernel:  [<ffffffffa0487494>] xfs_vm_write_begin+0x34/0xf0 [xfs]
May 21 21:28:01 arch-PC kernel:  [<ffffffff8115e6be>] generic_perform_write+0xbe/0x1e0
May 21 21:28:01 arch-PC kernel:  [<ffffffffa0495714>] xfs_file_buffered_aio_write.isra.1+0xf4/0x280 [xfs]
May 21 21:28:01 arch-PC kernel:  [<ffffffff811e931e>] ? user_path_at_empty+0x6e/0xd0
May 21 21:28:01 arch-PC kernel:  [<ffffffffa0495920>] xfs_file_write_iter+0x80/0x120 [xfs]
May 21 21:28:01 arch-PC kernel:  [<ffffffff811d7bc1>] new_sync_write+0x91/0xd0
May 21 21:28:01 arch-PC kernel:  [<ffffffff811d8383>] vfs_write+0xb3/0x200
May 21 21:28:01 arch-PC kernel:  [<ffffffff811d9069>] SyS_write+0x59/0xd0
May 21 21:28:01 arch-PC kernel:  [<ffffffff81577849>] system_call_fastpath+0x12/0x17
May 21 21:28:01 arch-PC kernel: Disabling lock debugging due to kernel taint

Tested on both ext4 and xfs (tried both on the same partition). Hard drive itself is healthy: on of its partition works fine under Windows.

The load it freezes under was generated by pacman (unpacking stage) and by bonnie++.

Freezes didn't happen like half year ago, but not sure since when it's happening. Now:

uname -a
Linux arch-PC 4.0.4-1-ARCH #1 SMP PREEMPT Mon May 18 06:43:19 CEST 2015 x86_64 GNU/Linux

Offline

#2 2015-05-21 20:46:15

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: HDD IO related system freeze/crash

If it is a desktop machine then check if all cables are properly connected. Also I would try to change the power supply (if you can borrow a known good one) and retest.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#3 2015-05-21 21:22:42

lhog
Member
Registered: 2014-01-21
Posts: 9

Re: HDD IO related system freeze/crash

R00KIE wrote:

If it is a desktop machine then check if all cables are properly connected. Also I would try to change the power supply (if you can borrow a known good one) and retest.

I've done that prior to posting: vacuum cleaned dust, checked and replugged sata and power cables.
HW problem has nothing to do here. I'm now writing this text from the same PC, but under Windows.

My other two PCs (laptops) on the same kernel/distro don't show anything similar. We need to go deeper, but I don't know where/how.

Offline

#4 2015-05-21 21:54:56

lhog
Member
Registered: 2014-01-21
Posts: 9

Re: HDD IO related system freeze/crash

Funny enough it looks like it's DRAM to blame. I bought 16GB upgrade recently...

I also checked if writting to USB causes crash and it did. Next I'm running memtest86+ and I get some red/faulty addresses.

P.S. Wonder why I never caught faulty memory caused exception under Windows.

Offline

#5 2015-05-21 22:05:53

lhog
Member
Registered: 2014-01-21
Posts: 9

Re: HDD IO related system freeze/crash

Confirmed, removal of one DIMM chip has resolved the issue. Case closed.

P.S. Never knew it was likely to hit faulty memory.

Offline

#6 2015-05-22 17:10:14

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,172

Re: HDD IO related system freeze/crash

P.S. Wonder why I never caught faulty memory caused exception under Windows.

The error was probably in high memory.
Linux tries to use all available memory, while windows tries to minimize memory used .
(and with their memory management method they badly NEED to do that)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Online

#7 2015-05-22 18:09:44

nstgc
Member
Registered: 2014-03-17
Posts: 393

Re: HDD IO related system freeze/crash

Do you think it looks like this: https://bbs.archlinux.org/viewtopic.php?id=195687

Offline

#8 2015-05-22 19:06:38

lhog
Member
Registered: 2014-01-21
Posts: 9

Re: HDD IO related system freeze/crash

Lone_Wolf wrote:

P.S. Wonder why I never caught faulty memory caused exception under Windows.

The error was probably in high memory.
Linux tries to use all available memory, while windows tries to minimize memory used .
(and with their memory management method they badly NEED to do that)

To the point. It was the second DRAM plank and the faulty area was around 15.7GB. If it was not Linux, I would probably never know I had out of order memory area. Under Windows I played all the modern games, had tons of open tabs in browser, had few IDEs open at the same moment. It never failed me. From that I conclude that 16 GB of RAM is a huge overkill for a usual desktop.

@nstgc, no it didn't look like this for me. Actually the only repeating pattern I got crashes under was writing to the disk or usb stick. My speculation would be, that kernel allocates write cache memory somehow around the top edge of the physical memory.

Offline

Board footer

Powered by FluxBB