You are not logged in.
Pages: 1
I'm having this really weird problem with my raid controller, and believe it or not, I think linux is causing somehow :S
For the last few months I've been using a RAID0 array connected to my onboard GSATA controller as a system drive. This is the motherboard http://www.newegg.com/Product/Product.a … 6813128375
It has two sata controllers, the other one is an intel controller.
Up until a couple of weeks ago I only had 1 OS on this computer, windows 7 installed on a partition on the raid array. The raid array is called TERARAID (just for future reference in this post).
I've been able to mount TERARAID using dmraid to activate it until the latest kernel upgrade. Since then it hasn't been able to start in linux.
But that's not all, cause linux seems to cause the raid controller to malfunction temporarily so bad that TERARAID can't even be activated during post.
I'm gonna try my best to explain this.
Lets just start with when I first turn on the computer.
It goes through post as normal and I can see TERARAID being activated in the GSATA bios.
Then I proceed to boot into Windows which is on a partition on TERARAID. Everything works fine, I can access all my data on the drives and the OS is completely stable, so it doesn't look like a drive failure at all.
Note: I can reboot from windows and back into windows over and over again, the raid never fails.
The weirdness starts as soon as I boot into linux. I try to run dmraid -ay to activate the array but only 1 raided drive is found. Both drives in the array are detected, but only one of them seems to be a part of a raid array.
At first I just thought I had messed something up when I updated the kernel or something, BUT, when I reeboot from linux and POST is going through the raid menu, TERARAID fails to start, during post.
Both drives are detected, but only one of them raided. It's always the same drive too, I even tried switching the cables.
TERARAID just keeps failing until I turn off the computer and do a cold-boot. Then everythings fine again, I can boot into windows and TERARAID works perfectly. Then I boot into linux and it starts all over again.
I did some searching and found out people have been having problems with the gigabyte sata controller, so I thought I'd try moving the drives to the intel controller. Guess what... it behaves exactly the same way.
This is just the strangest computer problem I have ever encountered and I work in a PC repair shop.
I'm hoping someone could point me in the right direction, like what log files I could check to find some answers.
Offline
check...... /proc/mdadm.stat..... and ..../etc/mdadm.conf .....for raid data
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
check...... /proc/mdadm.stat..... and ..../etc/mdadm.conf .....for raid data
hmmm... mdadm.conf only had one line that's not commented out "DEVICE partitions"
I tried adding "DEVICE /dev/sde* /dev/sdd*" (the raided drives), but nothing changed, didn't think it would.
I might try making some more changes to the config. Thanks.
mdadm.stat doesn't exist. Is this a log file or something?
Offline
Perhaps i made an oops...
Maybe .../proc/mdstat ...is better
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
Perhaps i made an oops...
Maybe .../proc/mdstat ...is better
nope... it's not there either
I did a... locate *m*stat ... and nothing mdadm related comes up :S
Offline
Sorry..oops
Use ....cat /proc/mdstat.... lists all raid and their characteristics as I recall
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
Sorry..oops
Use ....cat /proc/mdstat.... lists all raid and their characteristics as I recall
hehe... there is no mdstat so that's not gonna work.
g useful to add except to say that dm-raid (fakeraid) is not the same as mdadm (software raid). /proc/mdstat applies to the latter.
I see.
I thought dmraid was a front end for mdadm or something... idk.
maybe I should be using mdadm then?
Last edited by raginaot (2010-12-02 01:56:10)
Offline
There is a wiki for dmraid.
One command there is ...dmraid -tay... for raid data............
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
There is a wiki for dmraid.
One command there is ...dmraid -tay... for raid data............
ah thanks. I guess I should read more.
I already did a clean install though, last one was my first time installing arch so I thought I'd give it a second try.
Was worth it, fixed some other issues, but not this one.
Now I'm working on a fresh setup and haven't even installed dmraid. Still the raid array fails as soon as I reboot from arch.
I did update the kernel right away after setup though. Maybe its something in the new kernel thats causing this?
Thought I'd check the kernel log, found something that looks interesting, might be nothing though, i'm not very used to reading linux logs.
Dec 2 23:46:35 localhost kernel: ata5: SATA max UDMA/133 abar m2048@0xfbffc000 port 0xfbffc280 irq 40
Dec 2 23:46:35 localhost kernel: ata6: SATA max UDMA/133 irq_stat 0x00400040 , connection status changed
I'm pretty sure these are the ports (5 and 6) the drives are hooked up too. this "connection status changed" thing just looks wrong, could this be where the drive is removed from the raid array.
I'll have to get to bed now but tomorrow I'm gonna try switching the drives to different ports to verify.
I also posted the entire log here.
http://pastebin.com/cd35Wh8v
Offline
I've tried different sata ports and different cables, still nothing.
went through the kernel.log again, trying to find more info on the drive that keeps failing.
Dec 6 05:04:54 behemoth kernel: sde: sde1 sde2 sde3
Dec 6 05:04:54 behemoth kernel: sde: p3 size 3354071040 extends beyond EOD, enabling native capacity
Dec 6 05:04:54 behemoth kernel: ata9: hard resetting link
and
Dec 6 05:04:54 behemoth kernel: sd 8:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Dec 6 05:04:54 behemoth kernel: sde: detected capacity change from 1000203804160 to 1000204886016
Dec 6 05:04:54 behemoth kernel: sde: sde1 sde2 sde3
Dec 6 05:04:54 behemoth kernel: sde: p3 size 3354071040 extends beyond EOD, truncated
Dec 6 05:04:54 behemoth kernel: sd 8:0:0:0: [sde] Attached SCSI disk
It seems to me that the partition is larger then the size of the disk, which would be normal I think since it's a raid0, and therefore the kernel has to jump through hoops to add it to the table.
But this is all chinese to me so I'm mostly just guessing :P
There's nothing like this going on for any of the other drives.
I remember when I first installed arch with 2.6.33 kernel, in fdisk -l, it said something about the partition exceding the size of the disk, but I was still able to activate the array.
Now it just appears as a normal drive
Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1c7fba32
Device Boot Start End Blocks Id System
/dev/sde1 * 2048 206847 102400 7 HPFS/NTFS
/dev/sde2 206848 552962047 276377600 7 HPFS/NTFS
/dev/sde3 552962048 3907033087 1677035520 7 HPFS/NTFS
tried adding
ARRAY /dev/md0 devices=/dev/sdd2,/dev/sde2
to mdadm.conf
then did
mdadm --assemble --scan
but it gave me this
mdadm: /dev/sde2 has no superblock - assembly aborted
don't really know what that means
Last edited by raginaot (2010-12-06 05:38:53)
Offline
Pages: 1