I am experiencing a weird boot problem on a ACER TRAVELMATE 341T laptop. I have both ARCH and WIN2K installed, with GRUB as a bootloader. Until recently both systems we running without problems.
A few weeks ago I suddenly could not boot ARCH anymore. The GRUB menu shows up, I select Arch, some blurb appears about uncompressing the kernel and then the screen goes BLANK and the laptop REBOOTS. I can still boot my WIN2K just fine, so my hardware seems OK.
My harddisk has the following layout:
/dev/hda1 NTFS (WIN2K)
/dev/hda6 FAT32 (shared between windows and arch)
/dev/hda8 REISERFS (archlinux)
I figured there might be something wrong with the kernel or the boot record. So, I burned `Timo's rescue cd' (http://rescuecd.sourceforge.net/), which btw is an excellent rescue cd. Using the rescue cd, I did the following:
1. Set some conservative values in the /boot/grub/menu.lst -> No changes
2. MANUALLY upgraded my 2.6.10-3 kernel with to a more recent 188.8.131.52-1 kernel, by unpacking the pkg.tar.gz to the mounted REISERFS partition -> Now the system no longer reboots, it just displays a `Decompressing kernel' message and FREEZES
3. Reinstalled GRUB to my MBR by typing:
None of this helped so far. So, please oh please, is there anyone out there with some friendly advice?
It may help to know what exactly changed between the moment it worked and the moment it stopped working. So did you dig into the logfiles for interesting info? Anything from kernel warnings/errors to which packages are upgraded.
That said, check your harddisk for badblocks, especially the boot partition or wherever the kernel is.
Thanks for the reply.
I checked the logs, but could not find anything out of the ordinary. No updates in pacman.log that I would suspect of hosing the system.
I checked my windows partions with scandisk (including surface scan) and my linux partitions using `badblocks -n' but found nothing.
After that I converted my SWAP partition (hda7) to EXT2 and copied the contents of my /boot dir (from hda8). When I tried to run grub setup i got an error saying grub could not mount the partition. How weird is that? The partition was mountable just fine by the kernel.
I retried with EXT3 and got the same error. Finally I formatted as Reiserfs and managed to get grub to setup to MBR. However, upon reboot the problem persisted. Of course, i copied the image from (hda8)/boot/, so if it was corrupted it would still be now.
Thanks for the help so far. Any other ideas?
EDIT: As for your first question, I can't think of anything significant that might have caused the problem. I might have done a hard powerdown once or twice (I don't recall), but that's about it. Reiserfs should handle that pretty well.
First of all, make a backup of all important data (/etc, /home, `pacman -Q > filelist` is a good start), just in case.
Did you check the filesystems?
Try installing a fresh kernel on the ex-swap partition. Also booting with the earlyprintk=vga option may give some more info. If the kernel can't even decompress then the kernel image may be corrupted somehow. Also try reinstalling Grub to both the root (or better yet: use that ex-swap as boot partition and install Grub to it) and MBR, Grub itself can be corrupted too.
Well, I still do not know what is going on, but I am getting somewhere.
I installed the 2.4 kernel package (again by manually upacking the pkg.tar.gz) and this one boots my system just fine.
The 2.6 kernel still hangs my system though, even after reinstalling the package when booted to 2.4 (with hda7 mounted as /boot). The last bit of text I see is:
[Linux-BzImage, setup=0x1600, size=0x2aa51c]
Adding `earlyprintk=vga' to the kernel line does not yield any additional output. It seems that the 2.6 kernel doesn't even get decompressed. What could cause this?
My archlinux packages are on my FAT32 partition (which I checked with scandisk), so I doubt my kernel26 package is corrupted. Still, I am going to copy a kernel26 package from a different computer, just to make sure.
PS: The first thing I did was check my filesystems, of course. All windows volumes were given a surface scan in Win2k and I did a reiserfscheck on hda8. No errors were found.
Very strange, and it worked before with 2.6.10? Did you happen to change anything at all in the BIOS?
Desperate attempt: untar the kernel26 package manually to somewhere else, and copy only the kernel to the ex-swap with a different name and try that one out. Also compare the md5sum of both that file and the other one.
ok, i untar'ed the kernel and copied to a different name like you suggested, but it changed nothing. md5sum's were identical for both files.
then, i copied the 184.108.40.206 kernel from my main box and installed that. now the screen goes blank again and the system reboots after a second or so.
i _really_ do not know what is going on. but i guess i should be happy 2.4 is working. i was thinking about rolling back anyway, since my laptop only has 192mb ram ;-).
btw, I did not change any bios settings, for the simple reason that my ACER bios does not have many settings to be change.
thanks for your help so far.
192 Mb ram isn't that few, and a 2.6 kernel should use hardly more ram than a 2.4 one.
Something very weird is going on if the 2.6.10 kernel first worked, but now suddenly not anymore. What if you install a 2.6.10 kernel again, does that work? If it does and 2.6.11 doesn't then at least it's clearly a kernel problem (or coincidence is in a corny mood). You could try to pin it down by trying different versions, or just live on and simply use a working version.
just had an idea: could this be an ACPI/APM problem? found something on the Gentoo Boards that indicates this might be the case. will try booting with `acpi=off apm=off noapic' when i get home...
It could be, but it doesn't matter if disabling ACPI/APM/apic works or not if you don't go further and try to find out what's causing this. If you want to help to fix a possible kernel bug then get 2.6.12-rc5 and see if that works. If not, then try the newest -mm kernel. If both don't work then the problem isn't fixed yet and reporting it may be helpful for them.
Just of note, I have had a similar issue before, I compiled a kernel on the weekend, rebooted into it, booted it all fine, the next day it didnt boot, its selected at grub, and my screen just goes blank - the kernel never even starts.
I have no idea why it happens, all my other kernels work fine, but it's happened to me before, seems like a random bug.
it doesnt help that I cant check logs for it, the root fs is never mounted so there cannot be logs there, nor is there any output from the kernel.
my advice: just chuck anotehr kernel on or reinstall arch, ususally fixes it.
ok, tried changing the acpi/apm/apic settings, but it didn't help.
iphitus, i get the feeling as well that this bug is pretty random. what i do not understand is how it can be that the kernel does not even unpack.
i will investigate this some more once i get my new laptop screen. the old one is broken, so right now i have to hook up the monitor of my main box each time i want to do something, which is a bit of a hassle. i will try some different kernels and maybe install lilo, to see if that changes something.
thanks for the help. i will report back in a few days.
Well, I changed the screen on my lappy and now my problems have misteriously disappeared. Computer booting again without problems.
So, hurrah, I guess. Though this does leave me wondering what could have caused the problem..