How does linux get written to the hard drive during an installation.

pirateprentice · 2019-08-19 17:57:20

Hey guys. I hope this is the correct sub-forum for this question.
I'm wondering if anyone can help clear up some confusion I have regarding linux in general. I am a new user.

During the arch installation as an example, I don't understand how the installation actually gets written to the hard drive.
Does the

mount

command write the files to the device?
I also don't understand how the operating system itself actually gets written to the drive.

For instance during a new install, once the root partition is mounted, the install guide directs you to run the command

 pacstrap /mnt base

And from this command, I can understand how the tools get written to the drive, but I'm very confused as to how the linux filesystem and operating system get
written to the drive, as there is no command like (psuedocode)

Copy CD contents to /dev/sda1

I hope the question is clear. If it is not, please advise me how I can make it more understandable. I would not have made this post if
I could find the answer during a web search, but I have looked for several days and cannot find an answer.

Last edited by pirateprentice (2019-08-19 21:12:25)

jasonwryan · 2019-08-19 18:10:11

mount doesn't install anything, it just makes the fielsystem available.

You also copied the wrong command, it is `pacstrap` that installs the base system in the Arch installation process: https://wiki.archlinux.org/index.php/In … e_packages

vim /usr/bin/pacstrap

Scimmia · 2019-08-19 18:19:48

pirateprentice wrote:

And from this command, I can understand how the tools get written to the drive, but I'm very confused as to how the linux filesystem and operating system get written to the drive

The filesystem gets created when you create it with mkfs. As for the operating system, why do you think it's different than "tools"?

WorMzy · 2019-08-19 18:20:14

From an Arch Linux perspective: Nothing* from the liveCD gets copied to the install root, and at no point does the installation guide direct you to run `pacman -S base base-devel`. You get directed to run pacstrap, and this is what installs the actual system -- by downloading the packages you have told it to (usually base, or at least a subset of this, and possibly other packages you want) from the mirror you chose earlier on, and installing them into the new root partition.

Read `man 8 mount` to see what `mount` does, although the clue is in the name.

Other distributions may do it differently. Gentoo, for example, provide base OS images which you extract onto your new root partition and then use to install the rest of your system.

* except for pacman's keyring and mirrorlist, unless you explicitly tell pacstrap not to copy these.

Last edited by WorMzy (2019-08-19 18:22:37)

Trilby · 2019-08-19 20:46:09

Just to build on the above a bit - I suspect much of your confusion may be clarified by understanding what `mount` does (it doesn't install anything, but it is an essential prerequsite).

In a Windows system you may have different drives, e.g., your C: drive is generally an internal hard drive, A: and B: are generally "floppies" if you have any, D: is often an optical / cd drive, E: may be a flash drive, etc. At any DOS prompt you are only working within one of these at a time. To copy between drives in the Windows world, you explicitly specify them: `cp myfile D:\path\to\whatever` (or something like that, I barely remember DOS anymore).

In *nix systems, there may be many distinct devices storing data (hard drive, floppies, CDs, flash drives, etc) but they are all represented in a single unified directory tree. Everything is under the same "root" directory '/'.

So if you have two actual hard drive, each with two partitions, you have a total of 4 partitions. In Windows this would have to be represented as 4 different drive "letters" (C:, D:, E:, F: ... and that's assuming Windows used partitions, I don't actually remember that). In *nix, you just have 4 different device nodes. These device nodes, when running a linux system, would be listed under /dev; so you'd have /dev/sda1, /dev/sda2, /dev/sdb1, and /dev/sdb2. There are two actual devices sda and sdb and each one of them (in this example) has two partitions sda1 and sda2, sdb1 and sdb2.

But these are just the device nodes themselves. You can't change directory "into" them as they are not directories, they are devices. You can't really cp anything to them (technically you can, but it would not do what you expect, and would destroy the existing partition if the command was allowed to complete).

To work with the contents of those device nodes, or the files stored on them, they need to be mounted somewhere in the directory tree. For example, we could run the following commands:

mount /dev/sda2 /mnt
mkdir /mnt/boot
mkdir /mnt/home
mount /dev/sda1 /mnt/boot
mount /dev/sdb1 /mnt/home

The first command makes all files present on the sda2 device available under /mnt. For example, if there was a file called "mytext.txt" on sda2, it would not be available at /mnt/mytext.txt.

The next two commands create new directories under /mnt ... but since sda2 is mounted there, it actually creates those directories on the sda2 device. So now the sda2 device would contain "mytext.txt" and two directories "home/" and "boot/".

Next we mount sda1 on /mnt/boot, so any content on the sda1 device will now be available under /mnt/boot.

More importantly, and more directly relevant to the installation process, anything written to /mnt/boot/ will be stored on the sda1 device. So when you create a file /mnt/boot/syslinux.cfg it actually creates a file called "syslinux.cfg" in the top level directory of the sda1 device.

Continuing on, when vmlinux-linux (the actual linux kernel) is written to /mnt/boot/ it is actually stored on the sda1 device. When the bash shell executable is written to /mnt/usr/bin/bash it is actually stored on sda2. And it is stored under /usr/bin/ at the base of the sda2 device.

Just imagine directory trees that can be attached, detached, and reattached to each other with the single rule that there is only ever one actual root of the tree at any given moment in time.

When you boot the installation medium, you create filesystems and mount them under /mnt/ similar to the example above. Lots of executables are then written (by pacman / pacstrap) to the location /mnt/usr/bin/. Then you chroot (using arch-chroot) which stands for "change root": you change what part of the tree gets to be the actual root for the time being to /mnt/. From that point on, what was written to /mnt/usr/bin/ can then be found at /usr/bin/.

pirateprentice · 2019-08-19 20:58:39

My apologies, I DID mean pacstrap, I followed the installation guide to the letter and just got it wrong here. First post updated.

Trilby wrote:

Just to build on the above a bit - I suspect much of your confusion may be clarified by understanding what `mount` does (it doesn't install anything, but it is an essential prerequsite).
In a Windows system you may have different drives, e.g., your C: drive is generally an internal hard drive, A: and B: are generally "floppies" if you have any, D: is often an optical / cd drive, E: may be a flash drive, etc. At any DOS prompt you are only working within one of these at a time. To copy between drives in the Windows world, you explicitly specify them: `cp myfile D:\path\to\whatever` (or something like that, I barely remember DOS anymore).
In *nix systems, there may be many distinct devices storing data (hard drive, floppies, CDs, flash drives, etc) but they are all represented in a single unified directory tree. Everything is under the same "root" directory '/'.
So if you have two actual hard drive, each with two partitions, you have a total of 4 partitions. In Windows this would have to be represented as 4 different drive "letters" (C:, D:, E:, F: ... and that's assuming Windows used partitions, I don't actually remember that). In *nix, you just have 4 different device nodes. These device nodes, when running a linux system, would be listed under /dev; so you'd have /dev/sda1, /dev/sda2, /dev/sdb1, and /dev/sdb2. There are two actual devices sda and sdb and each one of them (in this example) has two partitions sda1 and sda2, sdb1 and sdb2.
But these are just the device nodes themselves. You can't change directory "into" them as they are not directories, they are devices. You can't really cp anything to them (technically you can, but it would not do what you expect, and would destroy the existing partition if the command was allowed to complete).
To work with the contents of those device nodes, or the files stored on them, they need to be mounted somewhere in the directory tree. For example, we could run the following commands:
mount /dev/sda2 /mnt
mkdir /mnt/boot
mkdir /mnt/home
mount /dev/sda1 /mnt/boot
mount /dev/sdb1 /mnt/home
The first command makes all files present on the sda2 device available under /mnt. For example, if there was a file called "mytext.txt" on sda2, it would not be available at /mnt/mytext.txt.
The next two commands create new directories under /mnt ... but since sda2 is mounted there, it actually creates those directories on the sda2 device. So now the sda2 device would contain "mytext.txt" and two directories "home/" and "boot/".
Next we mount sda1 on /mnt/boot, so any content on the sda1 device will now be available under /mnt/boot.
More importantly, and more directly relevant to the installation process, anything written to /mnt/boot/ will be stored on the sda1 device. So when you create a file /mnt/boot/syslinux.cfg it actually creates a file called "syslinux.cfg" in the top level directory of the sda1 device.
Continuing on, when vmlinux-linux (the actual linux kernel) is written to /mnt/boot/ it is actually stored on the sda1 device. When the bash shell executable is written to /mnt/usr/bin/bash it is actually stored on sda2. And it is stored under /usr/bin/ at the base of the sda2 device.
Just imagine directory trees that can be attached, detached, and reattached to each other with the single rule that there is only ever one actual root of the tree at any given moment in time.
When you boot the installation medium, you create filesystems and mount them under /mnt/ similar to the example above. Lots of executables are then written (by pacman / pacstrap) to the location /mnt/usr/bin/. Then you chroot (using arch-chroot) which stands for "change root": you change what part of the tree gets to be the actual root for the time being to /mnt/. From that point on, what was written to /mnt/usr/bin/ can then be found at /usr/bin/.

Thanks for this. Wonderfully detailed response; however, it has allowed me to add a bit to my question. My confusion .lies in the fact that, as I understand it, a filesystem such as ext4 could be used for other OS than just linux, it doesnt superimpose a directory structure like /, instead / is just a symbol pointing to some part of the disk (to use a simplified explanation). To clarify, I don't think the mkfs command actually creates /etc/ for instance, but instead creates an ext4 partition, and the kernel then superimposes an abstraction over it starting with /.

I could see how I could associate a location with that symbol, but I don't understand how those references persist across reboots without writing to the system. Don't be too harsh, I know I must be seriously confused here but I can't work it out.

I don't know where the kernel is stored and I don't know how, when i mount a device, that information persists when I reboot the computer.

When I type something like cd /etc/locale.gen (just as an example), I understand that the kernel (?) has associated that symbol with some part of my drive if I've mounted ./ to the device, but where is this information stored so that I dont have to remount every time I reboot?

Update
I've thought about this a lot more and I think I have a working model in my head, so I am going to summarize it here and ask that some kind soul tell me if I'm essentially correct, or wildly wrong.
When you do a fresh install, make a filesystem, and mount it for the root directory, then install base, pacstrap starts placing things in different areas of the disk depending on the file system format (i.e ext4), and then when grub is installed it contains some reference to where the kernel is stored (?) When the computer boots up,, BIOS is hardwired to look for grub in a specific location, and grub points to the kernel. The kernel then starts referencing the tree structure according to what it understands (i.e as /etc/) and the processor (?) then says "Okay, this is an ext4 partition, so /home is gonna be at x location".

Last edited by pirateprentice (2019-08-19 21:15:17)

graysky · 2019-08-19 21:14:03

You are forcing the partition of your choosing to be defined as / by running pacstrap on the mount point.

mount /dev/sda2 /mnt/new
pacstrap /mnt/new base

Make sense?

pirateprentice · 2019-08-19 21:19:05

graysky wrote:

You are forcing the partition of your choosing to be defined as / by running pacstrap on the mount point.
mount /dev/sda2 /mnt/new
pacstrap /mnt/new base
Make sense?

Sure, but I'm confused as to how that can be persistent across reboots. I think grub must contain a reference to the /, and bios knows where grub is, is this correct?
And then I'm confused how the filesystem is navigated. There must exist somewhere a mapping between, for instance, /etc/ and some sector on the disk. Does pacstrap create this file?

graysky · 2019-08-19 21:24:13

pirateprentice wrote:

Sure, but I'm confused as to how that can be persistent across reboots. I think grub must contain a reference to the /, and bios knows where grub is, is this correct?
And then I'm confused how the filesystem is navigated. There must exist somewhere a mapping between, for instance, /etc/ and some sector on the disk. Does pacstrap create this file?

Yes. Look at your /boot/grub/grub.cfg where this is defined. The partition you created is defined on the partition table on the disk so yes, it too has a roadmap of the physical space. Yes, pacstrap (I believe) calls pacman which untars the packages in the proper hierarchy among other tasks.

Trilby · 2019-08-19 21:25:36

pirateprentice wrote:

as I understand it, a filesystem such as ext4 could be used for other OS than just linux

Correct. Though while Windows can access ext4 (I think with special tools) I'm pretty sure a windows OS could not be installed to an ext4 filesystem. But that caveate is a bit beside the point. You are certainly correct that ext filesystems can be used in other OSs.

In linux, when a device is attached (ext4 or otherwise) a device node is created by the kernel (nowadays in collaboration with udev) at a location like /dev/sda. Another step needs to be taken to mount the device to a location to read and write to it's contents. In Windows it's not so different, except that those two steps are essentially tied together and happen behind the scenes: the new device is plugged in, and the Windows OS creates a device node (or similar) and immediately assigns a drive letter to it. The drive letter (e.g., E:) is comparable to a mount point (in linux /mnt or /home, or whatever you assign it to be).

In either case, the device itself has it's own internal directory structure. Lets say the device contains the following:

mytext.txt
documents/
documents/mydocument.doc
documents/important_stuff/tax_return.pdf
music/
music/pirated_crap.mp4

So it has one file and two directories at it's root, and then some more files and a subdirectory under those directories. When this device is attached to a window's machine, the tax return would be found here (for example):

E:\documents\important_stuff\tax_return.pdf

But note that drive letter "E:" may vary. On linux it depends where it is mounted. If the device is mounted on /home/ then the tax return would be available here:

/home/documents/important_stuff/tax_return.pdf

Or if it were mounted on /mnt/ then that same file would be avialable here:

/mnt/documents/important_stuff/tax_return.pdf

Note that either /home/ or /mnt/ are functionally similar to the drive letter in windows. The difference is that Windows uses only single letter locations for new devices; in linux/unix systems a new device can be mounted effectively on any existing directory.

pirateprentice wrote:

I don't know where the kernel is stored

When the system is running, it is (generally) at /boot/vmlinux-linux. But on many common setups, /boot/ is a mount point for a device. In my case I mount /dev/sda1 on /boot, so my kernel is available at /boot/vmlinuz-linux, but it is actually stored on the top level of the sda1 device. When I was installing this system, sda1 was mounted on /mnt/boot/ and the kernel was initially copied to /mnt/boot/vmlinuz-linux.

pirateprentice wrote:

and I don't know how, when i mount a device, that information persists when I reboot the computer.

Ah ... great question. It doesn't!

It really doesn't. While the system is running, the kernel keeps track of what is mounted where (just type `mount` on it's own to see all the current mounts the kernel is tracking).

When you shut down, everything is unmounted, and the kernel stops running, so noting is storing that information anymore. On the next boot, the init system (systemd currently as the default in arch) mounts all the file systems again. All our "stuff" is in the same place after we reboot as it was before only because the init system mounts everything to the same places that it did last time.

The instructions to ensure the init system mounts everything to the correct place are in /etc/fstab. This is a list of what devices get mounted to what directory. If you were to screw up your /etc/fstab file, the init system could fail to get things right and the information would not seem to persist. (Note: with systemd there are some exceptions to this as it has mechanisms other than fstab now - but for a long time and with every other init system what I've described was 100% the case - now it's mostly the case).

Of course this just passes the buck ... how does the init system find /etc/fstab when nothing is mounted?!

In your boot loader configuration, something specifies the "root" filesystem. In fact, the kernel accepts a command line parameter for "root=..." specifying which of the many devices/partitions are supposed to be the root of the tree that is (re)created at each boot up.

The kernel (effectively) mounts that partition creating the root filesystem on which everything else will be attached. Then the kernel starts the init process, which looks for something at /etc/fstab. If /etc/fstab isn't there, stuff breaks.

Conceptually, this is effectively how it works. In reality, this was how it worked once upon a time. Now, in reality, there are a few added details around the initramfs. You may want to learn those details down the road, but I suspect it may be a bit soon for that.

EDIT:

pirateprentice wrote:

and bios knows where grub is, is this correct?

Yes. On BIOS/MBR systems, part of the executable code of the bootloader is copied to a specific and predefined location on the physical disk device. This is actually outside any of the "accessible" space - in other words, it is not a "file" created in any directory; it is raw data written to a physical location of the disk not within any of the partitions. The BIOS firmware executes the machine code at that specific location when the system turns on.

In EFI systems, there is more firmware memory, and there is actually a filesystem in the firmware's memory (on the MOBO, not on a "disk"). Here machine-executable "efi" binaries can be stored which function similarly to BIOS boot loaders (though they're generally called "boot managers" as their actual role is a bit different).

Last edited by Trilby (2019-08-19 21:30:36)

pirateprentice · 2019-08-19 21:29:38

Trilby wrote:

pirateprentice wrote:
as I understand it, a filesystem such as ext4 could be used for other OS than just linux
Correct. Though while Windows can access ext4 (I think with special tools) I'm pretty sure a windows OS could not be installed to an ext4 filesystem. But that caveate is a bit beside the point. You are certainly correct that ext filesystems can be used in other OSs.
In linux, when a device is attached (ext4 or otherwise) a device node is created by the kernel (nowadays in collaboration with udev) at a location like /dev/sda. Another step needs to be taken to mount the device to a location to read and write to it's contents. In Windows it's not so different, except that those two steps are essentially tied together and happen behind the scenes: the new device is plugged in, and the Windows OS creates a device node (or similar) and immediately assigns a drive letter to it. The drive letter (e.g., E:) is comparable to a mount point (in linux /mnt or /home, or whatever you assign it to be).
In either case, the device itself has it's own internal directory structure. Lets say the device contains the following:
mytext.txt
documents/
documents/mydocument.doc
documents/important_stuff/tax_return.pdf
music/
music/pirated_crap.mp4
So it has one file and two directories at it's root, and then some more files and a subdirectory under those directories. When this device is attached to a window's machine, the tax return would be found here (for example):
E:\documents\important_stuff\tax_return.pdf
But note that drive letter "E:" may vary. On linux it depends where it is mounted. If the device is mounted on /home/ then the tax return would be available here:
/home/documents/important_stuff/tax_return.pdf
Or if it were mounted on /mnt/ then that same file would be avialable here:
/mnt/documents/important_stuff/tax_return.pdf
Note that either /home/ or /mnt/ are functionally similar to the drive letter in windows. The difference is that Windows uses only single letter locations for new devices; in linux/unix systems a new device can be mounted effectively on any existing directory.
pirateprentice wrote:
I don't know where the kernel is stored
When the system is running, it is (generally) at /boot/vmlinux-linux. But on many common setups, /boot/ is a mount point for a device. In my case I mount /dev/sda1 on /boot, so my kernel is available at /boot/vmlinuz-linux, but it is actually stored on the top level of the sda1 device. When I was installing this system, sda1 was mounted on /mnt/boot/ and the kernel was initially copied to /mnt/boot/vmlinuz-linux.
pirateprentice wrote:
and I don't know how, when i mount a device, that information persists when I reboot the computer.
Ah ... great question. It doesn't!
It really doesn't. While the system is running, the kernel keeps track of what is mounted where (just type `mount` on it's own to see all the current mounts the kernel is tracking).
When you shut down, everything is unmounted, and the kernel stops running, so noting is storing that information anymore. On the next boot, the init system (systemd currently as the default in arch) mounts all the file systems again. All our "stuff" is in the same place after we reboot as it was before only because the init system mounts everything to the same places that it did last time.
The instructions to ensure the init system mounts everything to the correct place are in /etc/fstab. This is a list of what devices get mounted to what directory. If you were to screw up your /etc/fstab file, the init system could fail to get things right and the information would not seem to persist. (Note: with systemd there are some exceptions to this as it has mechanisms other than fstab now - but for a long time and with every other init system what I've described was 100% the case - now it's mostly the case).
Of course this just passes the buck ... how does the init system find /etc/fstab when nothing is mounted?!
In your boot loader configuration, something specifies the "root" filesystem. In fact, the kernel accepts a command line parameter for "root=..." specifying which of the many devices/partitions are supposed to be the root of the tree that is (re)created at each boot up.
The kernel (effectively) mounts that partition creating the root filesystem on which everything else will be attached. Then the kernel starts the init process, which looks for something at /etc/fstab. If /etc/fstab isn't there, stuff breaks.
Conceptually, this is effectively how it works. In reality, this was how it worked once upon a time. Now, in reality, there are a few added details around the initramfs. You may want to learn those details down the road, but I suspect it may be a bit soon for that.

Thank you!! This is exactly the kind of answer I was looking for.

Last edited by pirateprentice (2019-08-19 21:45:41)

progandy · 2019-08-19 22:38:28

Here are some articles that have pretty good explanations as well:

https://access.redhat.com/documentation … ess-basics (Follow the red arrow at the end for more)

https://wiki.archlinux.org/index.php/Arch_boot_process

Last edited by progandy (2019-08-19 22:39:07)

Lone_Wolf · 2019-08-21 14:21:05

emphasis by me

graysky wrote:

pirateprentice wrote:
Sure, but I'm confused as to how that can be persistent across reboots. I think grub must contain a reference to the /, and bios knows where grub is, is this correct?
And then I'm confused how the filesystem is navigated. There must exist somewhere a mapping between, for instance, /etc/ and some sector on the disk. Does pacstrap create this file?
Yes. Look at your /boot/grub/grub.cfg where this is defined. The partition you created is defined on the partition table on the disk so yes, it too has a roadmap of the physical space. Yes, pacstrap (I believe) calls pacman which untars the packages in the proper hierarchy among other tasks.

The partition table only has information about start and end of the partition .
The roadmap that links physical sectors with folders & files is created and maintained by the filesystem .

A well-known proprietary filesystem even took it's name from the roadmaps it used and how many bits where available for an entry .

Arch Linux

#1 2019-08-19 17:57:20

How does linux get written to the hard drive during an installation.

#2 2019-08-19 18:10:11

Re: How does linux get written to the hard drive during an installation.

#3 2019-08-19 18:19:48

Re: How does linux get written to the hard drive during an installation.

#4 2019-08-19 18:20:14

Re: How does linux get written to the hard drive during an installation.

#5 2019-08-19 20:46:09

Re: How does linux get written to the hard drive during an installation.

#6 2019-08-19 20:58:39

Re: How does linux get written to the hard drive during an installation.

#7 2019-08-19 21:14:03

Re: How does linux get written to the hard drive during an installation.

#8 2019-08-19 21:19:05

Re: How does linux get written to the hard drive during an installation.

#9 2019-08-19 21:24:13

Re: How does linux get written to the hard drive during an installation.

#10 2019-08-19 21:25:36

Re: How does linux get written to the hard drive during an installation.

#11 2019-08-19 21:29:38

Re: How does linux get written to the hard drive during an installation.

#12 2019-08-19 22:38:28

Re: How does linux get written to the hard drive during an installation.

#13 2019-08-21 14:21:05

Re: How does linux get written to the hard drive during an installation.

Board footer