You are not logged in.

#1 2011-05-10 00:35:37

rykki
Member
Registered: 2011-05-10
Posts: 3

System becomes unusable after a few hours?

Hey archers,

I've been on arch for about 2 months now and didn't start having problems until maybe 3 weeks ago.  After a few hours of use, everything will completely (and suddenly) become really unresponsive: even an uncached ls or man takes upwards of 15-30 seconds.  Using strace, I saw the commands were stalling on IO operations (getdent64 for example), so I came to suspect my disk.  The odd thing is, it completely resolves itself upon reboot, at least until I leave the computer on for a few more hours.

About my disk: it's a Crucial 64GB SSD, / and /home are mounted with noatime and both partitions are at about 65% capacity.  I profiled the disk using dd before and after reboot the most recent time it became unresponsive:

Before power cycle:

andy@vulpes ~ $ dd if=/dev/zero of=temp bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 214.902 s, 1.2 MB/s
andy@vulpes ~ $ sudo -i
Password: 
[root@vulpes ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@vulpes ~]# logout
andy@vulpes ~ $ dd if=temp of=/dev/null bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 426.4 s, 630 kB/s

After power cycle:

andy@vulpes ~ $ dd if=/dev/zero of=temp bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 3.01267 s, 89.1 MB/s
andy@vulpes ~ $ sudo -i
[root@vulpes ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@vulpes ~]# logout
andy@vulpes ~ $ dd if=temp of=/dev/null bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.22105 s, 220 MB/s

Notice the magnitude difference in throughput speeds (Read 630 kB/s vs. 220 MB/s, Write: 1.2 MB/s vs. 89.1 MB/s), and this doesn't even really cover the disk latency I'm seeing.

Anyone seen anything like this, or have any ideas on how to fix it/debug it more?  The fact that it fixes itself after power cycle makes me think it isn't the disk itself, but who knows: a power cycle cycles the disk to some extent too (remount, clear caches, etc.).  My other thought is that it might be a kernel upgrade, but I don't remember if this started happening after I upgraded my kernel or not.  If it matters, I started using Gnome 3 about a week before this started happening.  I don't have much linux profiling-fu, so any help would be appreciated.

Thanks!

Offline

#2 2011-05-10 03:01:08

Scars
Member
Registered: 2011-05-01
Posts: 10

Re: System becomes unusable after a few hours?

Have you tried running the fsck for your filesystems?

Offline

#3 2011-05-10 03:27:53

rykki
Member
Registered: 2011-05-10
Posts: 3

Re: System becomes unusable after a few hours?

Scars wrote:

Have you tried running the fsck for your filesystems?

I ran a fsck -n and it showed I had errors so I ran shutdown -Fr now to force a check... three times.  Most of the errors have been fixed, but fsck is still reporting errors:

andy@vulpes ~ $ sudo fsck -n
Password: 
fsck from util-linux 2.19
e2fsck 1.41.14 (22-Dec-2010)
Warning!  /dev/sda3 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sda3 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 1414483 has zero dtime.  Fix? no

Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 1414512 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -5676745 -5681578
Fix? no

Free blocks count wrong (2458077, counted=2457987).
Fix? no

Inode bitmap differences:  -1414483 -1414512
Fix? no

Free inodes count wrong (1302486, counted=1302456).
Fix? no


/dev/sda3: ********** WARNING: Filesystem still has errors **********

/dev/sda3: 300010/1602496 files (1.5% non-contiguous), 3943825/6401902 blocks

Could the filesystem errors actually cause this, or are the errors just another symptom of a bad disk?

Last edited by rykki (2011-05-10 05:12:43)

Offline

#4 2011-05-16 15:25:30

yungtrizzle
Member
Registered: 2011-04-24
Posts: 139

Re: System becomes unusable after a few hours?

It could be that your hard drive is losing its ability to spin up after it spins down. The fact that it only happens after extended use indicates this. Fsck is not very accurate, there's an better version in e2fsprogs package.

Last edited by yungtrizzle (2011-05-16 15:25:54)

Offline

#5 2011-05-16 15:44:40

laloch
Member
Registered: 2010-02-04
Posts: 186

Re: System becomes unusable after a few hours?

yungtrizzle wrote:

It could be that your hard drive is losing its ability to spin up after it spins down.

It's an SSD big_smile

Offline

#6 2011-05-16 15:51:55

laloch
Member
Registered: 2010-02-04
Posts: 186

Re: System becomes unusable after a few hours?

I would try to install kernel26-lts to see if the problem is kernel (SW) or HW related.

Offline

#7 2011-05-16 16:40:59

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,804

Re: System becomes unusable after a few hours?

rykki wrote:

.... it completely resolves itself upon reboot...
....Before power cycle:...
...After power cycle:...

You said reboot, but you seem to indicate this was accomplished by a full shutdown to a power off state.  What happens with a warm boot?

Last edited by ewaller (2011-05-16 16:41:33)


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#8 2011-06-04 07:33:29

rykki
Member
Registered: 2011-05-10
Posts: 3

Re: System becomes unusable after a few hours?

Sorry about not replying for a while, I was on travel for a few weeks... still seem to be having the problem though :<

laloch wrote:

I would try to install kernel26-lts to see if the problem is kernel (SW) or HW related.

This just implies a more stable kernel?  I will try this when I get a chance.

ewaller wrote:
rykki wrote:

.... it completely resolves itself upon reboot...
....Before power cycle:...
...After power cycle:...

You said reboot, but you seem to indicate this was accomplished by a full shutdown to a power off state.  What happens with a warm boot?

I have to reboot using the reboot switch on the case (connected to the motherboard).  I'm not sure if this is considered a warm reboot or not (i.e. I'm not sure if the SSD is 'reset'), but since the system gets so locked up it seems like my warmest option.

Offline

Board footer

Powered by FluxBB