You are not logged in.
I'm using JFS as root-fs ever since I started with Linux and so far never had any issues... except this.
About a year ago or so I was testing some 3D-games and it wasn't uncommon for my box to completely freeze at times. So during one day I was pressing the reset button several times which of course led to several fsck runs. I wasn't looking at the screen all the time so didn't notice anything in particular. But later that day I ran 'pacworld' - a little python script (in AUR) that checks if all files from 'pacman -Ql' output are really present - and I noticed that my complete /usr/man/man3 hierarchy was gone. I wouldn't have been surprised about some stuff in /var or dotfiles in ~, but I was mostly installing/testing/removing games at that time and I doubt any of those would install something to /usr/man/man3.
Okay, fast forward to last Tuesday.
I had three Arch-boxen (all with JFS on /) running at the same time, doing mostly maintenance stuff (-Syu, backups, log cleaning etc), and I thought it would be a good idea to run a full fsck on all of them. So I ran 'sudo touch /forcefsck' and eventually rebooted each of them over the course of the day.
Now, I had exactly the same thing happen on the other two (yes, both) machines! The first one (from the preceding paragraph) fscked without problems (its man3 dir also got fully repopulated from all the pkg rebuilds over the time).
Affected box A was only recently installed, has a /boot partition (ext2) and a / on LVM, and runs KDEmod.
Box B is an 8y old notebook with a 2y old Arch install, no LVM and Win2k dualboot, running Xfce.
The only JFS mount option on all my installations is "noatime", and as you surely know, there is nothing to tweak with jfs_mkfs, besides the label.
This is the output from box A:
jfs_fscklog version 1.1.12, 24-Aug-2007
processing started: 12/18/2007 14.3.20 [xchkdsk.c:1452]
The current device is: /dev/mapper/lvm-arch [xchkdsk.c:1527]
Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 [fsckpfs.c:3233]
Primary superblock is valid. [fsckmeta.c:1551]
The type of file system for the device is JFS. [xchkdsk.c:1544]
Block size in bytes: 4096 [xchkdsk.c:1857]
Filesystem size in blocks: 2097152 [xchkdsk.c:1864]
**Phase 0 - Replay Journal Log [xchkdsk.c:1871]
LOGREDO: Log already redone! [logredo.c:555]
logredo returned rc = 0 [xchkdsk.c:1903]
**Phase 1 - Check Blocks, Files/Directories, and Directory Entries [xchkdsk.c:1996]
File system object DF176158 has corrupt data (1). [fsckdtre.c:1867]
**Phase 2 - Count links [xchkdsk.c:2087]
Inode F176153 has incorrect link count. [fsckconn.c:783]
Inode F327074 has incorrect link count. [fsckconn.c:783]
Inode F328844 has incorrect link count. [fsckconn.c:783]
Inode F328864 has incorrect link count. [fsckconn.c:783]
Incorrect link counts have been detected. Will correct. [fsckconn.c:798]
**Phase 3 - Duplicate Block Rescan and Directory Connectedness [xchkdsk.c:2120]
**Phase 4 - Report Problems [xchkdsk.c:2198]
File system object DF176158 is linked as: /usr/man/man3 [fsckino.c:320]
cannot repair the data format error(s) in this directory. [xchkdsk.c:1206]
cannot repair DF176158. Will release. [xchkdsk.c:1244]
**Phase 5 - Check Connectivity [xchkdsk.c:2230]
No paths were found for inode F16044. [fsckconn.c:311]
No paths were found for inode F16045. [fsckconn.c:311]
No paths were found for inode F16046. [fsckconn.c:311]
No paths were found for inode F16047. [fsckconn.c:311]
No paths were found for inode F16048. [fsckconn.c:311]
No paths were found for inode F16049. [fsckconn.c:311]
No paths were found for inode F16050. [fsckconn.c:311]
No paths were found for inode F16065. [fsckconn.c:311]
No paths were found for inode F16066. [fsckconn.c:311]
No paths were found for inode F16067. [fsckconn.c:311]
No paths were found for inode F16068. [fsckconn.c:311]
etc...
A subsequent full fsck run finally put 7150 files in /lost+found. :/
Did you ever encounter something similar with JFS?
No? Then, wouldn't you mind trying this for yourself?
Do this:
1) Make a list of the packages owning files in /usr/man/man3, just in case it really goes *poof* (if you really care about section 3, tar it up as well)
find /var/lib/pacman/local/ -name files | xargs fgrep /man/man3/ | cut -d: -f1 | sort -u | awk -F/ '{print $6}' > man3pkgs
Or maybe it's even a package problem with some funky encoded filename, so it might get handy in tracking that down later.
2) Touch /forcefsck and reboot
3) Watch the fsck output
4a) Nothing wrong? Lucky you :)
4b) Something fishy? Run jfs_fscklog -f man3.fsck -e /dev/whatever and post about it here.
So much for that.
Now run and get those last Christmas presents!
1000
Offline
I have checked this on two different computers: I did not notice any problems. /usr/man/man3 is still there.
Offline
same here..no issue. JFS on all partitions.
I need real, proper pen and paper for this.
Offline
I was able to replicate this problem once on a laptop running JFS on a LUKS encrypted root. Here is the fsck output:
JFS chkdskSvcLog< processing started: 12/22/2007 16.4.29 [xchkdsk.c:1452]
The current device is: /dev/mapper/root [xchkdsk.c:1527]
Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 [fsckpfs.c:3233]
Primary superblock is valid. [fsckmeta.c:1551]
The type of file system for the device is JFS. [xchkdsk.c:1544]
Block size in bytes: 4096 [xchkdsk.c:1857]
Filesystem size in blocks: 14151127 [xchkdsk.c:1864]
**Phase 0 - Replay Journal Log [xchkdsk.c:1871]
LOGREDO: Log already redone! [logredo.c:555]
logredo returned rc = 0 [xchkdsk.c:1903]
**Phase 1 - Check Blocks, Files/Directories, and Directory Entries [xchkdsk.c:1996]
File system object DF617696 has corrupt data (3). [fsckdtre.c:2339]
**Phase 2 - Count links [xchkdsk.c:2087]
Inode F207933 has incorrect link count. [fsckconn.c:783]
Incorrect link counts have been detected. Will correct. [fsckconn.c:798]
**Phase 3 - Duplicate Block Rescan and Directory Connectedness [xchkdsk.c:2120]
**Phase 4 - Report Problems [xchkdsk.c:2198]
File system object DF617696 is linked as: /usr/man/man3 [fsckino.c:320]
cannot repair the data format error(s) in this directory. [xchkdsk.c:1206]
cannot repair DF617696. Will release. [xchkdsk.c:1244]
**Phase 5 - Check Connectivity [xchkdsk.c:2230]
No paths were found for inode F80291. [fsckconn.c:311]
No paths were found for inode F80292. [fsckconn.c:311]
...
Indeed, /usr/man/man3 was gone after running the forced fsck. However, once I reinstalled all the affected packages; I reran the forced fsck with no problems. As this problem seems to be happening with partitions with JFS partitions sitting on top of a device mapper (lvm or dm-crypt), I'm thinking this could be a device mapper issue. It's also possible that something is going wrong with the install with a particular package using the /usr/man/man3 directory.
It's an interesting problem as fsck never runs into the higher phases of checks unless there is damage to the JFS log. I will put a note in the JFS wiki regarding this.
This may warrant a bug report; but I honestly don't think there is enough information at the moment to say whether this is an issue with JFS or device mapper. Has anyone seen this behaviour on an lvm (or luks encrypted root) using ext3 or xfs?
Last edited by PDExperiment626 (2007-12-22 06:58:55)
... and for a time, it was good...
Offline
JFS chkdskSvcLog<^@processing started: 12/22/2007 10.29.45 [xchkdsk.c:1452]
^@8^@The current device is: /dev/sda8 [xchkdsk.c:1527]
^@45H^@Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 [fsckpfs.c:3233]
^@Ü^_^@4^@Primary superblock is valid. [fsckmeta.c:1551]
^@ [D^@The type of file system for the device is JFS. [xchkdsk.c:1544]
^@
0^@Block size in bytes: 4096 [xchkdsk.c:1857]
^@.8^@Filesystem size in blocks: 2096474 [xchkdsk.c:1864]
^@4^@**Phase 0 - Replay Journal Log [xchkdsk.c:1871]
^@64^@LOGREDO: Log already redone! [logredo.c:555]
^@
^@6,^@logredo returned rc = 0 [xchkdsk.c:1903]
^@X^@**Phase 1 - Check Blocks, Files/Directories, and Directory Entries [xchkdsk.c:1996]
^@H^@File system object DF509280 has corrupt data (3). [fsckdtre.c:2339]
^@x,^@**Phase 2 - Count links [xchkdsk.c:2087]
^@<^@Inode F392478 has incorrect link count. [fsckconn.c:783]
^@L^@Incorrect link counts have been detected. Will correct. [fsckconn.c:798]
^@T^@**Phase 3 - Duplicate Block Rescan and Directory Connectedness [xchkdsk.c:2120]
^@90^@**Phase 4 - Report Problems [xchkdsk.c:2198]
^@L^@File system object DF509280 is linked as: /usr/man/man3 [fsckino.c:320]
^@:P^@cannot repair the data format error(s) in this directory. [xchkdsk.c:1206]
^@20<^@cannot repair DF509280. Will release. [xchkdsk.c:1244]
^@ 4^@**Phase 5 - Check Connectivity [xchkdsk.c:2230]
^@:<^@No paths were found for inode F389137. [fsckconn.c:311]
^@ <^@No paths were found for inode F389138. [fsckconn.c:311]
(countless similar lines - about 1500)
(The file contains some weird characters...)
I can confirm this error and my man3 folder was shown as being a incorrect link and then subsequently deleted. I don't use anything special like encryption or containers, but it is an logical partition, if this helps.
Offline
wuischke: Run jfs_fscklog -d -f on the extracted file, that'll take care of the formatting.
Thanks guys. I wasn't sure at first about posting this, but it seems I was right at considering this as more than just bad luck.
I'll wait some more and go through http://sourceforge.net/tracker/?group_i … tid=712756 and http://sourceforge.net/mailarchive/foru … discussion in the meantime. Perhaps somebody else would like to browse there as well or even post about this, because so far I'm not sure where to start in order to track this down.
1000
Offline
I've seen stuff like this happen all the time. JFS seems to have a marked tendency to lose files on forced reboots - and not files that are being written or edited at the time, but files that were written long ago and weren't being accessed when the power went off.
Also, I've seen forced reboots appear to corrupt JFS partitions to the point that they are completely unreadable - or even to the point that they no longer show up on the partition table, the gods only know how. I have no idea what the problem is, the best I can guess is that jfs_fsck sometimes overwrites stuff for some reason.
Offline
I'm now doubting my choice of JFS as a reliable fs But it's been there for me about a year. Surely, I have faced one of the above-mentioned issues, the one where old files get lost upon a forced reboot. I have one JFS partition which shows up as NTFS in cfdisk and JFS in other partition tools. It was NTFS before, but it's still weird.
I need real, proper pen and paper for this.
Offline
Kensai: are you saying that using deadline mitigates these issues?
Offline
I too had an experience with a corrupted JFS file system (I was testing out suspend to ram on my laptop). During many of the trials, I was forced to manually shutdown my system after a failed resume. After a few times of doing this, I ran a fsck on all my partitions and, after doing so, noticed several critical system files had been irreversibly corrupted. My advice is that if you think you may compute in an environment that could lead to an event trigger that forces you to perform a manual reboot, I would suggest something other than the JFS standard for now. Use ext* if you project possible system-wide freezes.
What I do now, if I think an action in the near-future could lead to an improper shutdown procedure, is remount all file systems as read-only, and carry out my tasks from a write-less interface. Using the JFS file system is okay, in my opinion, as long as you regularly back up your system and call a fsck on the partition's file system _immediately after_ you sense a possible corruption could have taken place. This may mean having to reboot from a rescue disk in order to save your system files, but the likelihood of file system corruption seems to increase as the number of forced shutdowns without a post-fsck also increase.
Without error there can be no brilliancy. ― Emanuel Lasker
Offline
It's really an incredible shame - JFS offers the best overall performance of Linux filesystems.
Out of curiousity, has anyone seen this on a distro other than Arch?
Offline
I'm using a elevator=deadline in my grub config, so it shouldn't be a solution for the problem. But still I feel the urge to create a new backup soon...my next file system is FFS. (because I want to try out OpenBSD )
I personally think it is a problem with fsck.jfs, because I couldn't see a problem with said man3-directory when doing some testing. But I'll do some further testing on my machine at home, let's see.
Offline
It's really an incredible shame - JFS offers the best overall performance of Linux filesystems.
Out of curiousity, has anyone seen this on a distro other than Arch?
JFS only logs operations on meta-data, maintaining the consistency of the filesystem structure, but not necessarily the data. A crash might result in stale data, but the files should remain consistent and usable.
http://www.linux.com/feature/119025?theme=print
Maybe this is something to consider while using this filesystem. It's almost impossible to find any real information about the reliability of JFS.
Use UNIX or die.
Offline
Kensai: are you saying that using deadline mitigates these issues?
Nah, it just performs better. I have had some power outages using JFS and none of them has led to corrupted file system, or loss of data by JFS.
Offline
Gullible Jones wrote:It's really an incredible shame - JFS offers the best overall performance of Linux filesystems.
Out of curiousity, has anyone seen this on a distro other than Arch?
JFS only logs operations on meta-data, maintaining the consistency of the filesystem structure, but not necessarily the data. A crash might result in stale data, but the files should remain consistent and usable.
http://www.linux.com/feature/119025?theme=print
Maybe this is something to consider while using this filesystem. It's almost impossible to find any real information about the reliability of JFS.
Logging only metadata shouldn't matter. The problem is the deletion of files that are written to the hard drive, and were not being accessed or edited at the time of the crash. That should *never* happen, on any filesystem, under any circumstances barring physical failures of the hard drive, and I have seen JFS do it consistently on brand new hard drives, when other filesystems work fine.
Offline
>and I have seen JFS do it consistently on brand new hard drives, when other filesystems work fine.
Just in Arch or any different distri?
Use UNIX or die.
Offline
let me confirm this..it's not an Arch-specific problem. I've been using JFS in my laptop ever since Linux got on, and I've faced the old files problem, albeit a few times only. On most occasions, it's not deletion, but the reverting of changes a la roll-back. Examples would be application profiles and permanent cache.
I'm guessing its more of user-related. We may be doing something that doesn't tango w/ JFS, since testimonials such as kensai's and the above source's do exist.
Last edited by schivmeister (2007-12-23 18:23:49)
I need real, proper pen and paper for this.
Offline
Commenting on a few of the words from the source provided by oli,
The main design goal of JFS was to provide fast crash recovery for large file systems, . . . JFS only logs operations on meta-data, maintaining the consistency of the file system structure, but not necessarily the data. A crash might result in stale data, but the files should remain consistent and usable.
If one is to assume that the JFS file system provides a more efficient procedure for keeping the structure of the file system in order, more so than other alternatives, then it naturally follows that the JFS distribution may be a good choice when solely considering this primary function. On the other hand, as evidenced by the several voices here, typical usage seems to lead to, in some cases, data corruption, due to either an unrealized "user-related" practice, as schivmeister has suggested, or an inadequate design choice in the heart of the file system's implementation.
If you frequently backup your data, then the trade-off may be worth it to you: the exchange of the convenience of not having to take as persistent steps towards minimizing data loss for a file system that, by its very nature, works to maintain "the consistency of the file system structure." My intuition tells me that if the user performs regular backups, then the JFS file system can trusted for normal productive use.
Without error there can be no brilliancy. ― Emanuel Lasker
Offline
As I see your discussion guys, I might think that my testing of JFS has not been through enough. Maybe I have had this problems with JFS and old files but maybe I have never looked to it to actually notice anything. I'm being worried by this right now. I guess ext3/4 is the only good file system I got left to use. Anyways, I'll just keep using JFS to see how it fairs.
Offline
I have been using JFS with for a little over a year now. I've used JFS in conjunction with lvm, software raid, and disk encryption and never have I experienced data loss issues until the current topic of this thread. As I posted earlier, this could very well be a problem with the device mapper or possibly a problem with an old arch package that is using the /usr/man/man3. Even though I was able to replicate this problem, once I reinstalled all the affected packages, the problem no longer occured.
As for all of the claims of JFS (or any files ystem) just up and loosing files, I regard with some skepticism. When I worked as a network analyst, I would frequently hear such claims only to later find out that the problem was due to user error, a poorly-written script or a bug in some higher level application. Unless, the problem can be replicated, claims of 'this frequently happens with JFS' are of little help.
If people are going to make claims that this is a JFS problem, they should be verifying that this doesn't happen on systems using ext3 or reiser, etc under similar circumstances. There is also the possibility that JFS may flush out hardware errors that other file systems may not (i.e. different file systems will distribute data differently across the physical hard disk). In the end, keep in mind that IBM has specifically optimized its JFS port to work with linux; it's not a hack undertaken by 3rd party developers. Given that and the fact that the port has been around for a number of years, I am reluctant to immediately assume that the above recollections of JFS just 'loosing' data are indeed the fault of JFS.
... and for a time, it was good...
Offline
If people are going to make claims that this is a JFS problem, they should be verifying that this doesn't happen on systems using ext3 or reiser, etc under similar circumstances. [...] I am reluctant to immediately assume that the above recollections of JFS just 'loosing' data are indeed the fault of JFS.
Very interesting point. So wee need more testing. This is the only problem of JFS theres not enough data about it. But then again I think this is the case with almost everyother file system.
Offline
FWIW bug reports have been filed (not by me):
http://sourceforge.net/tracker/index.ph … tid=712756
http://sourceforge.net/tracker/index.ph … tid=712756
The first one is six months old, the second is from last week. Both are open.
Offline
I have, unfortunately, twice, lost ALL data on JFS due to corrupt filesystem. I don't really remember anything specific regarding errors, but it was unable to repair. I decided to use reiserfs instead, which I haven't had any problems with ever since. At least from my side, I am not going to risk data loss again, since I do not take backups.
Although I might just have been very unlucky, but I don't feel like taking any unnecessary risks.
Offline
If people are going to make claims that this is a JFS problem, they should be verifying that this doesn't happen on systems using ext3 or reiser, etc under similar circumstances. There is also the possibility that JFS may flush out hardware errors that other file systems may not (i.e. different file systems will distribute data differently across the physical hard disk). In the end, keep in mind that IBM has specifically optimized its JFS port to work with linux; it's not a hack undertaken by 3rd party developers.
Precisely why I cannot grow to comprehend a JFS fault in this case..but so far nothing has happened again, things have been good.
I need real, proper pen and paper for this.
Offline