You are not logged in.
Folks, I have a unique and challenging problem that has exhausted my Arch Linux skills, and so I am now turning to you.
I have a vintage Pentium Pro 200 system (that’s 200 MHz folks! – 200 MHz 686 architecture – the original 686!), two CPUs, running a dual boot between Windows NT 4.0 and Arch Linux Duke (2007). It has 512 MB of RAM and a 120 GB hard drive, partitioned up between Windows NT and Linux. I built this system new in 2007, hence the dated version of Arch. It has run like a charm all these years, granted not getting that much use. After about a year of no use at all, I fired the system up last week to help with a little research for a blog post I was writing on networking Windows NT 4.0 and Mac OS 8.6. Windows NT 4.0 fired right up with no issue, and after I was done testing what needed to be tested I tried to boot over to Arch.
After a year of disuse, Arch unexpectedly and stubbornly refused to boot. The boot process started up just fine, but towards the end, it declared that it could not mount the root file system on the root device and took a kernel panic and stopped. My Arch skills have gotten a bit rusty in the last few years, but I dusted them off and went to work. My guess was a file system or superblock error. Arch wouldn’t boot, but I dragged out my trusty RIPLinux 2.9 Rescue Live CD and fired it up. It came right up and ran, and I was able to mount the Arch partition and view all the files… everything seemed to be there; it just wouldn’t boot. Windows NT 4.0 AND RIPLinux both boot and run on the machine, so the hardware is fine as well.
A little information on the disk layout. Windows NT 4.0 is in the first partition on the hard drive. The extended partition has a second Windows NT 4.0 partition (sort of a /home partition for Windows NT 4.0), followed by the main Arch partition (the one I am trying to boot), followed by a swap partition and then the largest partition, which I use to share data between Arch and Windows NT 4.0 (I have loaded an ext2/3 driver into Windows NT 4.0 and it happily accesses the Linux partitions on the box).
RIPLinux’s e2fsck did find some issues with the Arch partition and I had it repair them all. I checked again afterwards that all the files were still there, and they were. With the partition now known to be clean, and the superblock repaired from one of the backups, all should have been well. However, Arch still wouldn’t (and still won’t) boot.
RIPLinux has a kind of a chain loader function, so I had it attempt to start up Arch for me. However, this was flummoxed by the fact that Arch addresses all my hard drive partitions as /dev/sdax and RIPLinux addresses them as /dev/hdax. Hence, without a common language, it was hard to get the one to start the other. Still, using this function, I have been able to get a crippled version of Arch running on the machine again. No modules had been loaded, and so it couldn’t do almost anything, but there it was (and is), Arch Linux Duke, at the CLI level. From there, I can see all the files, I can move freely in and out of my user account and the root account, but I can’t make the thing actually boot properly.
If you have read this far, you are a trooper. Summarizing what I know, the hardware is good, the file system is clean, the superblock is good, I can mount it cleanly from a live CD and I can chain load a crippled version of Arch. Here is the boot process blow-by-blow. When I try to do a normal boot, the Windows NT 4.0 loader passes control to the Lilo boot sector I have placed on hda1 (sda1 in Duke’s parlance). Lilo takes over, present a menu and when I select Duke, takes off. Arch Linux Duke starts to boot. It gets a good long way along, all the way along to:
:: Loading udev events [Pass]
:: Mount root Read-only
:: Checking file systems
This is where it stops.
The next thing I see is:
/dev/sda6
The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else) then the superblock is corrupt and you might try running e2fsck with an alternate superblock:
E2fsck –b 8193 <device>
I then get a sort of character based splash screen that says
**********FILE SYSTEM CHECK FAILED ****************************
*
* Please repair manually and reboot. Note that the root file system
* is currently mounted read-only. To remount it read-write, type:
* mount –n –o remount,rw /. When you exit the maintenance
* shell, the system will reboot automatically
*
*****************************************************************************
Give root password for maintenance
At this point, I give the root password and enter the maintenance shell as root. I typed in “mount” and the first entry I got back is
/dev/sda6 on / type ext3 (rw)
This is exactly the root partition that the start up complains about. It is clearly there. I can see it, I can walk around it… it is clearly there. Why won’t it boot? Despite the message, the superblock is fine – it passes every test e2fsck can throw at it.
At this point, I did a “e2fsck /dev/hda6 (which is how RIPLinux would have passed it into Arch” and it says it is “clean”. I suspect that the Superblock message is because Arch sees root as sda6, while RIP passed it in as hda6...
Deciding to see what Arch would be seeing as it tried to set things up in the boot sequence, I tried the following next:
# mknod “/dev/root2” b 3 6
(“3” because RIPLinux refers to my hard drive as IDE, while Arch refers to it by major number “8”, which is SCSI. By the way, it IS an IDE drive – not sure why Arch insists on using the sdx nomenclature instead of hdx)
Then I entered “mount /dev/root2 /mnt/hda6” and “ls /mnt/hda6”
All was well. I can make the node, I can mount it, and I can see the contents. All is clearly well, but something is clearly wrong enough that Arch can’t boot.
I am totally out of ideas. I have tried every trick I know and am out of tricks. I would welcome any insights as to what I could try to get this venerable Arch installation back on its legs.
By the way, the key section of the /etc/lilo.conf file (lest anyone want to know) is:
#
image = /boot/vmlinuz26
root = /dev/sda6
label = ArchLinux-Duke
initrd = /boot/kernel26.img
read-only
#
I am stumped. Thanks in advance for any and all pointers you may be able to offer.
Last edited by mac57 (2014-06-02 17:42:21)
Cast off the Microsoft shackles Jan 2005
Offline
You missed out a lot lol. Linux uses the SCSI subsystem as a generic way to talk to IDE drives. Hence, they appear as /dev/sd* nodes while in fact they are not. This was done quite a while ago... .
My best guess is this: You have a rather old Arch installation. And maybe you once (or twice) booted a new livecd which mounted your root partition. It is known that once the ext3 filesystems are mounted by a new driver, they could set new flags. Which breaks backward compatibility with older drivers.
I suggest you chroot into the Arch installation and update the ext utility's (e2fsprogs). It could be that fsck is stumbling over these new flags.
A less instrusive alternative would be to just disable the fsck alltogether. I cannot see you tried that.
fs/super.c : "Self-destruct in 5 seconds. Have a nice day...\n",
Offline
If I'm reading it right, you haven't been updating your Arch installation at all, right?
Offline
A less instrusive alternative would be to just disable the fsck alltogether. I cannot see you tried that.
Thanks, this sounds like the most efficient test at this point. I am quite comfortable tromping about startup/config files, but I am not sure where the file system check is initiated. Would you be able to advise the name and location of the startup file that does this?
Cast off the Microsoft shackles Jan 2005
Offline
If I'm reading it right, you haven't been updating your Arch installation at all, right?
That is correct. After I got Arch installed, configured and stable, I didn't continue with ongoing updates. Given that the machine is a 200 MHz Pentium Pro, which is a pretty low spec environment to be running a modern OS in, I didn't want to take the ongoing risk of reducing performance and/or breaking it outright as a result of a recent update. Essentially, I froze the system once I had it stable, to keep it stable.
The main reason for putting Arch on the system in the first place was to gain USB and Firewire access to the system (I built in a USB/Firewire combo card for this purpose). Windows NT 4.0 does not support either USB or Firewire due to its early release date, and hence another OS was needed. Arch appeared to be able to run in this very "primitive" hardware environment and so I loaded it, as my enabler for USB and Firewire access. It met this need perfectly, and with astonishingly good performance, and keeping it unconditionally stable then became the paramount concern. Hence after a few weeks of updates, I ceased updating.
Cast off the Microsoft shackles Jan 2005
Offline
The superblock could not be read or does not describe a correct ext2 filesystem.
Is that a typo, or is it normal for older fsck to treat ext3 filesystems as ext2?
But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner
Offline
What you are doing here is pointless. Nobody will be able to solve issues with a 7 year old system. Besides, a current Arch Linux will install and work just fine on 512MB RAM (just don't try to run KDE or GNOME on it).
Offline
What you are doing here is pointless. Nobody will be able to solve issues with a 7 year old system. Besides, a current Arch Linux will install and work just fine on 512MB RAM (just don't try to run KDE or GNOME on it).
Thread starter indicated that part of the freeze was to keep things stable. Yes, I would also update 7 year old systems (I'm writing this from an ~10,5 updated old system).
But if others decide not to, that does not break any warranty whatsoever. (Because there is not any). And the forum rules are not forbidding anyone to ask for help using an aged system.
Rexilion wrote:A less instrusive alternative would be to just disable the fsck alltogether. I cannot see you tried that.
Thanks, this sounds like the most efficient test at this point. I am quite comfortable tromping about startup/config files, but I am not sure where the file system check is initiated. Would you be able to advise the name and location of the startup file that does this?
There is a binary in PATH called fsck which calls all the others for each seperate (and known fs). Mine is in /usr/bin/fsck, but I think yours should be in /sbin/fsck. I suggest you relocate that fsck and create a new fsck with a symlink to the true binary:
ln -s $(which true) /sbin/fsck
Something like that should work, given that I remembered the old location of fsck.
fs/super.c : "Self-destruct in 5 seconds. Have a nice day...\n",
Offline
Rexilion wrote:A less instrusive alternative would be to just disable the fsck alltogether. I cannot see you tried that.
Thanks, this sounds like the most efficient test at this point. I am quite comfortable tromping about startup/config files, but I am not sure where the file system check is initiated. Would you be able to advise the name and location of the startup file that does this?
There is a binary in PATH called fsck which calls all the others for each seperate (and known fs). Mine is in /usr/bin/fsck, but I think yours should be in /sbin/fsck. I suggest you relocate that fsck and create a new fsck with a symlink to the true binary:
ln -s $(which true) /sbin/fsck
Something like that should work, given that I remembered the old location of fsck.
Thanks Rexilion. Actually, I think the idea of disabling the fsck entirely is the easiest place to start. I am quite certain that I could not update just the e2tools package without updating half the rest of the system, due to dependencies.
Does anyone know which startup script launches the fsck? I could simply comment that line out and see what happens. Thanks.
Cast off the Microsoft shackles Jan 2005
Offline
I'm using systemd right now. So I have no idea where the old sysinit scripts are located.
I think that replacing fsck with true is the easiest to thing to do in order to disable fsck.
Another option would be (if this is honoured):
man fstab
The sixth field (fs_passno).
This field is used by the fsck(8) program to determine the order
in which filesystem checks are done at reboot time. The root
filesystem should be specified with a fs_passno of 1, and other
filesystems should have a fs_passno of 2. Filesystems within a
drive will be checked sequentially, but filesystems on different
drives will be checked at the same time to utilize parallelism
available in the hardware. [b]If the sixth field is not present or
zero, a value of zero is returned and fsck will assume that the
filesystem does not need to be checked.[/b]
Last edited by Rexilion (2014-05-01 07:04:40)
fs/super.c : "Self-destruct in 5 seconds. Have a nice day...\n",
Offline
I'm using systemd right now. So I have no idea where the old sysinit scripts are located.
I think that replacing fsck with true is the easiest to thing to do in order to disable fsck.
Another option would be (if this is honoured):
man fstab
The sixth field (fs_passno). This field is used by the fsck(8) program to determine the order in which filesystem checks are done at reboot time. The root filesystem should be specified with a fs_passno of 1, and other filesystems should have a fs_passno of 2. Filesystems within a drive will be checked sequentially, but filesystems on different drives will be checked at the same time to utilize parallelism available in the hardware. [b]If the sixth field is not present or zero, a value of zero is returned and fsck will assume that the filesystem does not need to be checked.[/b]
Very interesting Rexilion - thanks. I will give this a whirl.
Cast off the Microsoft shackles Jan 2005
Offline
Folks, thanks for all your helpful comments, and I wanted to report back to you that I finally overcame the issue, and ArchLinux-Duke (2007) is once again executing flawlessly on my old Pentium Pro 200 system. I won't bother reporting here all the blind allies I went down as I tried to figure out what was wrong, but in the end, literally moments before I was about to give up and overwrite my Arch installation with a new Linux variant (antiX seemed well suited for such old and low power hardware), my attention was drawn to a note I had made in my files back in 2007 about a problem with similar symptoms. In that case, I had just deleted ZenWalk Linux from the hard drive (both Arch and Zen had been on the drive), and merged several partitions to make use of the newly free space. This had changed Arch's view of the drive lettering, and what had been its /dev/sddx root device was now /dev/sdcx. Arch failed to boot, throwing off the same errors I was seeing now. I wish I had recalled that note a month or so ago! It would have saved me a lot of work and a lot of frustration.
At any rate, as a last step, and testing the idea that maybe the drive lettering had changed for some reason, I repeatedly manually booted Arch, specifying root=/dev/sda6, then /dev/sdb6, then /dev/sdd6, and finally, /dev/sdc6. Eureka! Arch now considered itself to be on /dev/sdc6 whereas previously it had been on /dev/sda6. This got me part way there, but the boot failed at the filesystem check stage and threw me into root. I disabled the file system check in /etc/rc.sysinit and got farther. Then I cleaned up /etc/fstab to agree with the new sdc naming, and I was back on the air fully.
So, what had happened was that Arch had changed its view of the drive it was on from sda6 to sdc6. While I could not understand why this "sudden" change had occurred, at least I had a solution, and had Arch back up and running.
Trolling through the rest of my notes, I found the answer. In 2012, the Tekram SCSI card in the machine failed, and I ultimately replaced it with an Adaptec card. The Tekram card did not have a BIOS segment on it. The Adaptec card did. My guess is that this caused the two internal SCSI devices I have built into the system (Iomega ZIP and Jaz respectively) to be enumerated first, claiming the "sda" and "sdb". device names. That left "sdc" for the root device, and that is where Arch went next. This is my guess anyway.
I should have caught this issue back in 2012, at the time, but from my notes, I can see that I tested the new card thoroughly using the Windows NT 4.0 side of the machine, but never thought to bring up Arch as well. Hence, this problem lay dormant for two years, before I attempted to fire up Arch last month and blundered right into it.
It has not all been bad. I have learned more about the ext2 and ext3 file systems and superblocks in the intervening time than I will ever need to use. I have learned how to manually boot Linux on a machine whose BIOS is so old that it cannot address the disk cylinder that the kernel is on and I have completely refreshed the many general Linux skills that used to just flow from my finger tips. It has been a frustrating experience, but ultimately a successful and useful one.
Just wanted to let everyone know that this is now [SOLVED]. I would mark the post as such, but I don't see any obvious way to do that. Thanks again everyone.
Cast off the Microsoft shackles Jan 2005
Offline
Just wanted to let everyone know that this is now [SOLVED]. I would mark the post as such, but I don't see any obvious way to do that. Thanks again everyone.
Edit the title of your first post and put "[SOLVED]" at the beginning
Fascinating story BTW --- I might try this method on my dad's old computer...
Velocitas Eradico
Offline
Thanks for the pointer Head_on_a_Stick. I have updated the title to reflect the [SOLVED] status.
Cast off the Microsoft shackles Jan 2005
Offline