You are not logged in.
Hello,
I have finally taken the plunge of installing systemd. And what a plunge it is... My system does not boot only once in ~20 times. The last line it prints is 'Starting Virtual Consoles' I did the following:
1. created the config files, and moved everything except the daemons line out of rc.conf, booted again - everything fine
2. installed systemd without - systemd-sysvcompat
3. changed service file for syslog-ng
4. adjusted service file for slim (and xfce)
5. tried to reboot
6. system does not boot - mostly hanging after the mentioned line, but I have seen udev problems as well
- if I remove init=/bin/systemd from grub, the boot hangs at processing udev events
7. turning off - on the machine until it boots
any ideas would be welcome!
Offline
From 6. it looks like you have problems even when booting with initscripts? If so, this is likely not a systemd, but a kernel (module) problem. How long do you wait while "processing udev events" hangs? There should be a timeout after 2 or 3 minutes.
Offline
I have waited at most a minute so I do not know really. You are right that it points to some general problem, however up to the point I added init=/bin/systemd to my kernel parameters I did not have any problems. Maybe re-generating the grub config screwed something. Should I paste some config or log file?
Offline
If you manage to boot into initscripts by removing init= from the kernel commandline, then I can't imagine what might be wrong. Maybe you upgraded some other software at the same time as testing out systemd? The kernel?
Try increasing the debug output of udev in /etc/udev/udev.conf.
Offline
now I could boot again
udev did not time out. then I tried to boot with systemd again. sometimes I received an error from something like usb_submit, and a couple of times on mounting my disk (emask 0x04 timeout), and then systemd complained it can not mount the disks.
is it possible that my disk - it is an ssd - is damaged, and somehow the parallel processing of systemd exposed this? if so can I do anything about it: move stuff to non bad sectors somehow (until I can get a new hdd)
Offline
Does smartmon tools work with ssds? If so, I'd suggest running the extended test on the disk to check for errors.
Have you fscked the filesystems?
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
indeed it seems to mee the SSD is the problem
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 120 120 050 Pre-fail Always - 0/5451854
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 096 096 000 Old_age Always - 4274h+19m+14.950s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 934
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 88
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 2
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 128 129 000 Old_age Always - 128 (0 127 0 129 0)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/5451854
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/5451854
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/5451854
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 281470681743460
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 4840
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2991
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2991
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3551
thank you for your help
Offline
indeed it seems to mee the SSD is the problem
Why? That output looks OK to me. My own disk actually looks rather less healthy...
Run
smartctl -a /dev/sdX
to get a fuller output. Run the extended test on the disk unless you run them regularly and have recent data in the log. (This will be included in the output of the above command together with any errors encountered during normal operation and other details of the disk.)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
here is it
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.4.9-1-ARCH] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: OCZ-VERTEX3
Serial Number: OCZ-V5FC2OJ0N884J4JQ
LU WWN Device Id: 5 e83a97 f37fd2339
Firmware Version: 2.02
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ACS-2 revision 3
Local Time is: Mon Aug 27 23:09:03 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7f) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0021) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 120 120 050 Pre-fail Always - 0/7340110
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 096 096 000 Old_age Always - 4276h+30m+30.850s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 934
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 88
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 2
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 128 129 000 Old_age Always - 128 (0 127 0 129 0)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/7340110
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/7340110
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/7340110
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 270544284942436
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 4852
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2996
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2996
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3552
SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Offline
I have taken photos from the hanging boots.
https://plus.google.com/photos/10574973 … h_HTn8bQGg
There are 3 "types" of them:
* Started setup virtual console
* Starting to load kernel variables
* cannot reset port 6 / ata1.00 failed command - here are more photos ending with the IO errors
Offline
So run the tests on the disks. I'm not sure why it says logging is supported and then says the log isn't but if you run the tests by hand, you can see what happens. I'd run the extended self-test.
Do you run the tests regularly? You need to edit the config file when you install the tools for this to happen (and start the daemon at boot). My output shows a record of 21 tests which the tools have run for me in accordance with my config.
I can't see the pictures you've posted. (I have issues with images in Firefox which I've never been sufficiently motivated to solve properly although no doubt I should...)
Your data is backed up, right?
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
hello,
I had a lot of work, so I did not have the time to try to fix the system. but during the week I have bought a new disk and I have re-installed arch on a fresh system. unfortunately it is still not better.
sysvinit does not even start. it hangs on some udev event processing indefinitely. systemd starts in one case from around twenty. sometimes there is no error, it just hangs. sometimes there is some ata error, similar to the ones on the images (in the previous post).
is it possible that the motherboard is broken, or some other component. the hard disk is new so this can be ruled out I think any ideas?
Offline
If it booted the live media to install, I don't see how the motherboard could be broken. Or did you do the installation with a different machine?
Did you use the latest install media? If so, which instructions did you follow and did you get any errors?
How long did you wait for udev to time out this time?
What errors? I may not be the only one who can't see the images and, in any case, saying they are "similar" doesn't tell us much.
What do the logs say (sysvinit) if you are getting any? Can you boot to a rescue shell?
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
I have used the latest media, and I have followed the official installation manual + the LUKS encryption manul (I chose simple passphrase authentication). I have installed on the same machine. Booting from the installation disc was hanging as well from time-to-time. I have waited more than five minutes. Only the notebook fan was getting louder nothing happened. After the installation the same: hanging on different outputs, or sometimes ata errors:
ata1.00 status { DRDY }
failed command: READ FPDMA QUEUED
hub: cannot disable port 6 (err = -110)
in dmesg I see only this error:
[ 9.213600] iTCO_wdt: cannot register miscdev on minor=130 (err=-16)
[ 9.214029] iTCO_wdt: probe of iTCO_wdt failed with error -16
The machine boots sometime, for instance I am using it now to post this message
Here is some bug report from the ubuntu guys - there were similar problems as well. It says I should add an irqpoll option to grub:
https://bugs.launchpad.net/ubuntu/+sour … bug/204916
on a suse forum they something about a kernel option:
http://forums.opensuse.org/english/get- … -drdy.html
here is some other suggestion on the ocz forums:
http://www.ocztechnologyforum.com/forum … d-commands
here they are writing it could be the SATA chipset:
http://superuser.com/questions/121391/s … -icrc-abrt
I will try these as well to see whether that helps or not.
Last edited by peletomi (2012-09-02 18:27:41)
Offline
To rule out a hardware problem, why don't you create a different live media and test that? That is, something other than Arch. That won't guarantee any problem is hardware because a linux bug might affect both, but if it works fine, you'll be able to be pretty sure it isn't hardware.
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline