You are not logged in.

#1 2015-04-05 14:47:29

nstgc
Member
Registered: 2014-03-17
Posts: 393

System lock ups when using hard disks

At first I thought I had the same issue as in this thread however after reading his logs I decided that it is unlikely since the topic opener seems to be getting a bunch of

Apr  4 17:00:15 daxteriv slim[708]: (Do:882): GLib-CRITICAL **: Source ID 46 was not found when attempting to remove it

, so I decided to start a new thread.

I am not getting anything in my logs when I look at them after the crashes. It seems as if whatever is causing the issue makes it impossible to write to the drives so nothing seems out of place until after it's too late, and quite obvious that something is wrong. The sole exception to this was some complaint about ata9.00, however this seems to have been a red haring since the complete removal of this drive solved nothing, though it did make the error go away.

I have run memtest, and prime95 (in windows since at the time I didn't want to boot my system) both coming back negative. I also checked for SMART errors, particularly UDMA errors. Again, negative.

Here are some previous threads of mine that I believe to be all related to the same issue:

https://bbs.archlinux.org/viewtopic.php?id=195449
https://bbs.archlinux.org/viewtopic.php?id=193937
https://bbs.archlinux.org/viewtopic.php?id=192704

[edit3] My previous thought was that this was a btrfs issue, however now I think it may be more hardware related.[/edit3]

Last night I was writing urandom to a new drive and I woke up in the morning to see that it had stopped prematurely, but not due to lack of space. The system had not locked up completely, which was new, however, as soon as I did something that accessed the disks (tried using the Gnome Dash), all hell broke loose. I immediately hit the power button, which would start a shutdown. I didn't know about REISUB, or I would have tried that. In anycase the shutdown failed and I had to hard reboot.

The evening before I was playing games just fine and doing normal activities.

Also, the other day, while booted into the live cd I think it started to crash since it was acting extremely bizarre, but I shutdown, successfully, before it could do more than complain a lot. At the time I was in the middle of scrubbing a btrfs volume after a system lock up. I think, but am not certain, that I saw on my motherboard "A5" which, according to the manual indicates "SCSI reset", but I had an extremely bad viewing angle. I know neither if what I think I saw was correct, nor do I know if the system was again in the process of a meltdown.

In general there is no activity that seems to be able to instantly trigger this, though in each case, if my memory serves me well, there was a constant stream of data either to or from a HDD.

My set up is that I have 5 HDDs, and an SSD. One HDD has Windows, my fallback installation of Arch, and daily backups. Another drive, which I just installed a few days ago currently only has single 50GB btrfs partition that is not part of any sort of larger volume. I also have three other drives each with four parttitions each involved in a btrfs volume. At the top is a 175 GB volume in a RAID0 configuration,  after that is a 25GB partiton in RAID1, then a 650GB partion (RAID1), and the remainder of the disk (about 80GB) is in single. The RAID0 is has games on it and sometimes for short term storage (a few hours), the 25GBx3 RAID1 is used for Arch as well my ~/.local, ~/.cache, ~/.thunderbird, ~/.mozilla, and ~/.config. The 50GB partition I mentioned is currently doing the job of this volume, and was not mounted last night, but has been before. The 650GBx3 is used for long term storage of just about everything, in particular ~ is mounted here. The single configured volume is used for downloads, storing ISOs and installers. The SSD is used for caching those partitions used in the RAID0 volume with bcache. I DO NOT think this is a bcache issue since the RAID0 volume has never had any errors.

My motherboard is a ASUS X79-DELUXE which has the SATA controllers: Intel X79 Express Chipset, Marvell 9230, ASMedia 1061.

[edit] Oh, I'm on kernel 3.19.2 and btrfs is 3.19.

[edit2] Also, I converted the RAID0 to a single in case that matters.

[edit4] I finished scrubbing and copied my "journalctl -b" output to pastebin. I can't see any thing that might raise any flags. http://pastebin.com/WXpbpTuc

Last edited by nstgc (2015-04-05 16:46:00)

Offline

#2 2015-05-22 18:26:53

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 14,893

Re: System lock ups when using hard disks

nstgc : it's hard to tell if your problems are memory related, but memtst often needs multiple passes to detect errors.
How many passes did you let it make ?

Is it possible for you to take out say half of the memory chips and see if that changes anything ?

Last edited by Lone_Wolf (2015-05-22 18:27:04)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

Board footer

Powered by FluxBB