Looking for a Little Filesystem Explanation

TrevorNT · 2012-02-27 02:50:08

Hi, Arch Community! Me and a few friends of mine from my college are trying to put together a very simple operating system (partially as a project to prove to ourselves we actually learned something from computer science; partially because making an operating system, even a small and simple one, is just pretty darn awesome). So me and one other person tasked ourselves with writing a simple file system.

The problem I'm having is that, as I'm studying how filesystems work exactly, I have only a very fuzzy picture of how UNIX-based filesystems work. I tried Googling it but didn't come to any results that really helped me to understand. I know they work by using structures called inodes, index nodes which store information on each file. I know there's a superblock which defines most of the filesystem located at the beginning of each partition, and several sites give the breakdown of just what goes into a filesystem's superblock. And I know each inode links, directly and (singly, doubly, triply) indirectly, to the blocks of data which make up a file. Aside from those few bits of data, I don't understand the concept of it very well.

So I figured I'd ask you guys, the Arch community, because I'm an Arch user and I've come here asking for help before and I know most of you are pretty darn smart. How does the file system structure in UNIX-based filesystems operate?

Thanks,
TrevorNT

/dev/zero · 2012-02-27 03:06:38

You're biting off too much at once, IMO. I would suggest start by working through some tutorials on FUSE. There are good tutorials for both C and python. You should also work through tutorials on writing a Linux kernel module.

With both of these skillsets in place, you'll be much better positioned to understand existing filesystems and/or hack something up of your very own.

TrevorNT · 2012-02-27 04:11:28

/dev/zero wrote:

You're biting off too much at once, IMO. I would suggest start by working through some tutorials on FUSE. There are good tutorials for both C and python. You should also work through tutorials on writing a Linux kernel module.
With both of these skillsets in place, you'll be much better positioned to understand existing filesystems and/or hack something up of your very own.

Lol, you're probably right as far as doing too much at once. But I still need to have it done one way or another, and my group is counting on me to deliver. And I definitely will work on FUSE and KM's too, given that I've always wondered exactly how they work anyway.

But still, I am interested in whatever you can tell me about the structure of a simple UNIX filesystem. Understand that when I say "simple", I mean that for now at least I just want to be able to store and retrieve small amounts of data on a hard disk. I'm interested in starting without folder structure, getting access/retrieval working first; then adding metadata about the files; finally, making folder structures work. If you still think that's too much at once, then I'll start from the KM and FUSE and go from there.

Thanks,
TrevorNT

/dev/zero · 2012-02-27 04:24:34

Well, although I have ideas on how to get started understanding it, I'm not an expert in any filesystem - speaking of which, I guess you mean ext2?

My (probably flawed) understanding is the filesystem is just a database. You have a table at the start of the disk which stores all the (seen and unseen) metadata for a file. For example, in say ext2 (but also other Unix filesystems as well), this look-up table will have a list of all file addresses, their size, permissions, which user/group they belong to, atime etc etc.

Part of the purpose of the filesystem module for your kernel is to interface with this table and keep it tidy. Another purpose is to use the file address and size information to actually affect the files. In ext2 (etc), I think the addresses are scattered fairly randomly over the drive subject to constraints about files not bumping into each other.

But take all this with a grain of salt. I'm just pulling it out of thin air, based only on some light reading, a little playing around with fuse, and some experience with Linux (but not as a developer).

TrevorNT · 2012-02-27 04:52:33

/dev/zero wrote:

Well, although I have ideas on how to get started understanding it, I'm not an expert in any filesystem - speaking of which, I guess you mean ext2?
My (probably flawed) understanding is the filesystem is just a database. You have a table at the start of the disk which stores all the (seen and unseen) metadata for a file. For example, in say ext2 (but also other Unix filesystems as well), this look-up table will have a list of all file addresses, their size, permissions, which user/group they belong to, atime etc etc.
Part of the purpose of the filesystem module for your kernel is to interface with this table and keep it tidy. Another purpose is to use the file address and size information to actually affect the files. In ext2 (etc), I think the addresses are scattered fairly randomly over the drive subject to constraints about files not bumping into each other.
But take all this with a grain of salt. I'm just pulling it out of thin air, based only on some light reading, a little playing around with fuse, and some experience with Linux (but not as a developer).

Ext2 was floated as a possibility between me and my friends, but ultimately we decided that the easiest UNIX-like filesystem to implement would be sfs (simple file system), thanks to the info on os-dev. It's even more stripped down than ext2, having no permissions system (a detail we would work out later, once we're all a fair amount more experienced than we are now).

My basic understanding of a filesystem is that there's a superblock at the beginning (information originally garnished from the first time I broke my filesystem in Arch and saw the infamous superblock error <_< ) which details a lot of information about what goes on in that partition, including metadata like creation and last mount times, the filesystem type, and a pointer to the root inode (/); then there's a large table of inode numbers and the names of the files which represent them, where the table is fixed in size from creation of the partition; after this comes a space that actually holds all the inodes and their associated data, in which each inode holds some of the metadata about the files, but no information about its structure or name, nor the actual data, which is only referenced to here; and then finally, the rest of the drive is data referenced and organized by the preceding inode structure. And I know the most basic unit of a hard drive, storage-wise and in terms of a filesystem, is a block, which in my case we're working with 512 byte blocks.

The more I look into it though, the more I think my ideas are wrong. It's possible there's more structure to the filesystem than I think. I'm here to try to figure out if I'm right or (the more likely scenario) just how far off I am.

Thanks,
TrevorNT

/dev/zero · 2012-02-27 04:57:33

haha, sounds like you know more than me already! Well, it's an interesting topic anyway. I hadn't heard of sfs before, so that will give me something to start with if I ever walk down this path myself .

TrevorNT · 2012-02-27 05:14:24

/dev/zero wrote:

haha, sounds like you know more than me already! Well, it's an interesting topic anyway. I hadn't heard of sfs before, so that will give me something to start with if I ever walk down this path myself .

Well, I had to do a fair amount of research to get this far. I had a lot of help, including the people in my group and the guy who runs my university's operating system engineers club. I'm in my sophomore year and just getting into assembly, so a lot of the low-level groundwork was written for me and my group. We're writing the main parts of the OS in C++, the language we all know best, though I do know C as well. (As for Python, I've been told multiple times that I should learn it because it's a worthy language to know, but it appears I lack the self-discipline to teach myself...)

Oh yeah, in case you're interested, here's a short OS Dev wiki article about SFS. This is what I'm going off of.

(Correction to my earlier post: it looks as though the inode entries in SFS go after the data area. I did not realize that until just now somehow o_o . Maybe I'm too busy absorbing the details to realize the order they're supposed to all go in...)

Thanks,
TrevorNT

Arch Linux

#1 2012-02-27 02:50:08

Looking for a Little Filesystem Explanation

#2 2012-02-27 03:06:38

Re: Looking for a Little Filesystem Explanation

#3 2012-02-27 04:11:28

Re: Looking for a Little Filesystem Explanation

#4 2012-02-27 04:24:34

Re: Looking for a Little Filesystem Explanation

#5 2012-02-27 04:52:33

Re: Looking for a Little Filesystem Explanation

#6 2012-02-27 04:57:33

Re: Looking for a Little Filesystem Explanation

#7 2012-02-27 05:14:24

Re: Looking for a Little Filesystem Explanation

Board footer