You are not logged in.
Cheers to all.
Since a few weeks I've been experiencing several application crashes and system freezes that force me to a hard reboot (my keyboard is connected via usb and when xorg freezes the keyboard is no longer detected and I can't access TTY).
At the beginning I opened a bug report on kde via drkonqui for one of the application and/or daemon crashes, I received as a reply that the crash happened in the nvidia driver so I had an exchange of emails with the nvidia bug service but without being able to give them enough information (my crashes are totally random and without the possibility to reproduce them).
In the meantime, several updates have been released for both Kde and Nvidia, but without changing things.
Examining the journactl I found some errors starting with "BUG:", however these errors are often different and are not always present at the total freeze.
Sometimes I also have several minute gaps (like ten or more) in my journal and often these gaps are just before the freeze occurs.
I've uploaded here some logs related to the boot before the freeze that also contain the backtrace of the crashes of some applications:
https://www.mediafire.com/folder/kkh0beagyhnmv/
I hope that some of you can understand something and give me directions or suggestions to solve this problem.
If you need more logs please tell me what you need and I will provide them.
Thank you in advance.
Offline
There's no indication for any kind of nvidia related crash in those logs.
There's an impressive amount of steam and waterfox crashes, some stuck CPU and a kwin crash w/o backtrace and some stuck CPUs through ksysguard accesses, but the troublemakers seem to be the formentioned processes. Is waterfox the binary release from the AUR?
Where's the KDE bug you filed? Link?
Consider running memtest86 (several cycles, eg. at least over night)
Offline
When it freezes, do the lights on the keyboard flash?
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Hello. Thank you for help.
Yes, waterfox is the binary release from the AUR.
I've also had several freezes while using mpv player.
This is the bug I've filled on KDE:
https://bugs.kde.org/show_bug.cgi?id=414394
I had already tried to run the memtest86. No errors detected after two complete cycles of test. I'll try to run several more.
By the way, a while ago I increased my ram from 4 to 8 gb, could it be related?
Added dmidecode output to mediafire folder.
Lights on keyboard do NOT flash when pc freezes.
UPDATE:
I tried to run memtest86+ for 4 passes, about 3 hours. No errors detected.
I can't leave the pc on during the night because it is located in the same room where I sleep (it is a desktop), so it is difficult to run the test for more hours.
(Maybe I'll try again on a day when I went off for work or something for several hours).
I tried to force multithread mode to get more passes in less hours (by pressing F2) but in this way the test always freezes completely during the operation called "#7 block move" and I can only force reboot.
Screenshots are in the mediafire folder.
Last edited by bugandy (2020-01-02 14:03:27)
Offline
Hi. Sorry for bump. I've done a memtest86+ for more of six hours and six passes, no errors detected at all. (screen is in mediafire folder).
At this point I think that my ram is clean.
Do you have any other ideas about what I could try to do to find the origin of the crashes?
Thank you.
Offline
Let's assume the waterfox and steamwebhelper crashes are because of waterfox (assuming steamwebhelper gets its mozilla module from there) and actually unrelated to the "general" freezes which would then be completely unknown.
Following the GPU theory and omitting HW issues, I'd try the LTS kernel and the 390xx driver series to see whether either or combined sidestep the issue.
If indeed GPU crashes there's a chance that the kernel itself is unaffected (you just have no visual output) and you might be able to ssh into the "frozen" system for closer inspections.
Offline
Hi.
Steam has crashed several times also under wine, not linking the native client but just the windows client installed in the prefix of wine, and I haven't installed waterfox under wine, then I think steam crashes are not because of waterfox.
In the last weeks I played a feral interactive game, this is because in journal I've posted there is only traces of native steam client, but crashes as started more time ago and I was playing a game under wine at the time.
Other than this I think the freeze could be related to crashes because sometimes while steam (the game) or mpv were about to freeze (stuttering and fps close to zero) I saved myself by pressing quickly alt+tab, in this way only the application froze (without crashing) and I could kill it without forcing the restart.
So it appears that full screen applications lock the keyboard when they freeze, preventing me from using it to exit them.
So probably like you say I have no visual output.
Sorry for the missing information.
Also note that I've had other crashes besides steam and waterfox, for example in the bug I filled out for kde I complained about a crash of the kickoff menu.
and you might be able to ssh into the "frozen" system for closer inspections.
This sounds like an interesting idea, but I guess it requires another machine with Arch installed, I only have another machine with Linux installed is not running Arch but running Debian.
Anyway I have never operated with ssh and I may need a lot of wiki to figure out how to get information through it.
ps: Monday, yesterday and today no freeze and only one crash for me.
Last edited by bugandy (2020-01-08 15:49:58)
Offline
Hello. I run in another freeze today. I try to ssh in my system while it frozen (shh was previous configured, enable and tested) trough android app, but ssh lost connection with this error message: "EHOSTUNREACH (No route to host)"
Fortunately, freezings seem to have dropped dramatically in the last few days.
Offline
Hi. Sorry for bump. I've done a memtest86+ for more of six hours and six passes, no errors detected at all. (screen is in mediafire folder).
At this point I think that my ram is clean.
Do you have any other ideas about what I could try to do to find the origin of the crashes?
Thank you.
Try to check the health of all hardware components :
- Power supply (voltages, in the bios)
- CPU cooling (too high temperature = high risk of freeze, errors)
- GPU cooling, GPU capacitors
- motherboard (capacitors)
- harddisk (check the SMART values, with smartmontools)
If you have a second graphic card then use it as a test.
You can also test a linux live cd (ubuntu for example) in order to see if system freezes are still here (which means likely a hardware problem).
If you are 100% sure that it's not a hardware problem then it's a software/driver problem.
You can reinstall all your packages with a pacman command (only if database is in a working state, not corrupted), first try to list the current installed packages (by parsing the /var/log/pacman.log file) or type the command (pacman -Qq > pkg-all.list), see also the wiki for pacman commands backup.
Last edited by Potomac (2020-01-13 12:27:45)
Offline
Hi. Thank-you for answers.
Voltages in my bios is:
Vcore = 1.152v
DRAM Voltage = 1.512v
I don't know how determinate if it's correct.
Temperature are well, cpu is 35°C-37°C in idle, 50°C-52°C under stress (higher in summer), gpu temp is about 43°C in idle and 70°C under stress.
I don't know ho to check capacitors, but there are no sizzling/crackling things in my case.
This is SMART value of my system disk (ssd):
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 099 099 010 Old_age Always - 0
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 23576
12 Power_Cycle_Count 0x0032 091 091 000 Old_age Always - 8529
177 Wear_Leveling_Count 0x0013 091 091 000 Pre-fail Always - 306
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 099 099 010 Pre-fail Always - 2
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 099 099 010 Old_age Always - 1
183 Runtime_Bad_Block 0x0013 099 099 010 Pre-fail Always - 1
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 072 030 000 Old_age Always - 28
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 253 253 000 Old_age Always - 1
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 303
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 19521590929
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 9313 -
About using a live I don't know if it could give an answer because freeze are never constant, sometimes a day or two goes by without freeze and sometimes there are more than one in a few hours.
Offline
Hi. Sorry for bump.
After two/three days without freeze, this evening I run into another two freezes, last freeze my journal log this at end (after the usual 'unable to handle page fault for address' with casual trace):
gen 16 22:29:09 pc-andre kernel: note: Xorg[500] exited with preempt_count 1
gen 16 22:29:09 pc-andre kded5[558]: The X11 connection broke: I/O error (code 1)
gen 16 22:29:09 pc-andre konsole[891]: The X11 connection broke (error 1). Did the X11 server die?
gen 16 22:30:09 pc-andre kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
gen 16 22:30:09 pc-andre kernel: rcu: 3-...0: (4 ticks this GP) idle=53a/1/0x4000000000000000 softirq=11735/11737 fqs=5888 last_accelerate: 4d31/9387, Nonlazy posted: ..D
gen 16 22:30:09 pc-andre kernel: (detected by 1, t=18004 jiffies, g=22013, q=2666)
gen 16 22:30:09 pc-andre kernel: Sending NMI from CPU 1 to CPUs 3:
gen 16 22:30:09 pc-andre kernel: NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
Now I try to unistall intel-microcode, let's see if this help.
Also another strange thing, during first freeze I was been able to ssh in my system and I tried to kill process without success, then I tried to 'sudo shutdown -r 0', ssh disconnected and I couldn't ssh anymore, but my system has always continues showing freeze image on the video. :-/
Offline
Wild guess, try to pass "rcutree.rcu_idle_gp_delay=1" to the kernel parameters…
Offline
Done.
Today xorg crashed and back to logon with this error:
gen 17 17:21:23 pc-andre kglobalaccel5[571296]: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
gen 17 17:21:23 pc-andre kglobalaccel5[571296]: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: wayland-org.kde.kwin.qpa, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb.
kglobalaccel is reported crash immediately after xorg.
I've reinstalled kglobalaccel, looking for 'xcp' in pacman -Ss return several packages, I've no idea what kglobalaccel is looking for.
Offline
pacman -Qkk kglobalaccel
In case there're errors, is there maybe a SSD trimming issue?
Output of
mount
Offline
No error in kglobalaccel files.
This is the check on all packages (I select English as system language before got pacman output, but it din't listen me):
# pacman -Qkk $(pacman -Qnq) | grep -v "0 file alterati"
attenzione: accountsservice: /var/lib/AccountsService/icons (i permessi non corrispondono)
attenzione: audit: /var/log/audit (i permessi non corrispondono)
accountsservice: 286 file totali, 1 file alterati
file di backup: apache: /etc/httpd/conf/httpd.conf (l'orario della modifica non corrisponde)
file di backup: apache: /etc/httpd/conf/httpd.conf (la dimensione non corrisponde)
audit: 155 file totali, 1 file alterati
attenzione: cups: /etc/cups/classes.conf (i permessi non corrispondono)
attenzione: cups: /etc/cups/printers.conf (i permessi non corrispondono)
attenzione: cups: /etc/cups/subscriptions.conf (i permessi non corrispondono)
attenzione: filesystem: /srv/ftp (l'UID non corrisponde)
attenzione: filesystem: /srv/ftp (il GID non corrisponde)
attenzione: filesystem: /srv/ftp (i permessi non corrispondono)
attenzione: filesystem: /srv/http (l'UID non corrisponde)
attenzione: filesystem: /srv/http (il GID non corrisponde)
file di backup: cups: /etc/cups/classes.conf (l'orario della modifica non corrisponde)
file di backup: cups: /etc/cups/classes.conf (la dimensione non corrisponde)
file di backup: cups: /etc/cups/printers.conf (l'orario della modifica non corrisponde)
file di backup: cups: /etc/cups/printers.conf (la dimensione non corrisponde)
file di backup: cups: /etc/cups/subscriptions.conf (l'orario della modifica non corrisponde)
file di backup: cups: /etc/cups/subscriptions.conf (la dimensione non corrisponde)
cups: 874 file totali, 3 file alterati
file di backup: filesystem: /etc/fstab (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/fstab (la dimensione non corrisponde)
file di backup: filesystem: /etc/group (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/group (la dimensione non corrisponde)
file di backup: filesystem: /etc/gshadow (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/gshadow (la dimensione non corrisponde)
file di backup: filesystem: /etc/passwd (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/passwd (la dimensione non corrisponde)
file di backup: filesystem: /etc/profile (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/profile (la dimensione non corrisponde)
file di backup: filesystem: /etc/resolv.conf (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/resolv.conf (la dimensione non corrisponde)
file di backup: filesystem: /etc/shadow (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/shadow (la dimensione non corrisponde)
file di backup: filesystem: /etc/shells (l'orario della modifica non corrisponde)
file di backup: filesystem: /etc/shells (la dimensione non corrisponde)
filesystem: 116 file totali, 2 file alterati
file di backup: glibc: /etc/locale.gen (l'orario della modifica non corrisponde)
file di backup: glibc: /etc/locale.gen (la dimensione non corrisponde)
file di backup: grub: /etc/default/grub (l'orario della modifica non corrisponde)
file di backup: grub: /etc/default/grub (la dimensione non corrisponde)
file di backup: grub: /etc/grub.d/40_custom (l'orario della modifica non corrisponde)
file di backup: grub: /etc/grub.d/40_custom (la dimensione non corrisponde)
attenzione: java-runtime-common: /usr/lib/jvm/default (il path del link simbolico non corrisponde)
attenzione: java-runtime-common: /usr/lib/jvm/default (l'orario della modifica non corrisponde)
attenzione: java-runtime-common: /usr/lib/jvm/default-runtime (il path del link simbolico non corrisponde)
attenzione: java-runtime-common: /usr/lib/jvm/default-runtime (l'orario della modifica non corrisponde)
file di backup: imagemagick: /etc/ImageMagick-7/policy.xml (l'orario della modifica non corrisponde)
file di backup: imagemagick: /etc/ImageMagick-7/policy.xml (la dimensione non corrisponde)
java-runtime-common: 21 file totali, 2 file alterati
errore: si è verificato un errore durante la lettura del pacchetto /var/lib/pacman/local/knetattach-5.17.5-2/mtree: Unrecognized archive format
attenzione: lib32-colord: /var/lib/colord (l'UID non corrisponde)
attenzione: lib32-colord: /var/lib/colord (il GID non corrisponde)
attenzione: lib32-colord: /var/lib/colord/icc (l'UID non corrisponde)
attenzione: lib32-colord: /var/lib/colord/icc (il GID non corrisponde)
knetattach: nessun file mtree
lib32-colord: 31 file totali, 2 file alterati
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.alias (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.alias (la dimensione non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.alias.bin (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.alias.bin (la dimensione non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.builtin.bin (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.dep (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.dep (la dimensione non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.dep.bin (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.dep.bin (la dimensione non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.devname (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.softdep (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.symbols (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.symbols (la dimensione non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.symbols.bin (l'orario della modifica non corrisponde)
attenzione: linux: /usr/lib/modules/5.4.12-arch1-1/modules.symbols.bin (la dimensione non corrisponde)
attenzione: mlocate: /var/lib/mlocate (il GID non corrisponde)
attenzione: mlocate: /var/lib/mlocate (i permessi non corrispondono)
linux: 6762 file totali, 9 file alterati
mlocate: 141 file totali, 1 file alterati
file di backup: pacman: /etc/pacman.conf (l'orario della modifica non corrisponde)
file di backup: pacman: /etc/pacman.conf (la dimensione non corrisponde)
file di backup: pacman-mirrorlist: /etc/pacman.d/mirrorlist (l'orario della modifica non corrisponde)
file di backup: pacman-mirrorlist: /etc/pacman.d/mirrorlist (la dimensione non corrisponde)
file di backup: pam: /etc/environment (l'orario della modifica non corrisponde)
file di backup: pam: /etc/environment (la dimensione non corrisponde)
file di backup: php: /etc/php/php.ini (l'orario della modifica non corrisponde)
file di backup: php: /etc/php/php.ini (la dimensione non corrisponde)
file di backup: php-apache: /etc/httpd/conf/extra/php7_module.conf (l'orario della modifica non corrisponde)
file di backup: pinentry: /usr/bin/pinentry (l'orario della modifica non corrisponde)
file di backup: pinentry: /usr/bin/pinentry (la dimensione non corrisponde)
file di backup: pulseaudio: /etc/pulse/daemon.conf (l'orario della modifica non corrisponde)
file di backup: pulseaudio: /etc/pulse/daemon.conf (la dimensione non corrisponde)
attenzione: shadow: /usr/bin/newgidmap (i permessi non corrispondono)
attenzione: shadow: /usr/bin/newuidmap (i permessi non corrispondono)
attenzione: systemd: /var/log/journal (il GID non corrisponde)
shadow: 558 file totali, 2 file alterati
file di backup: sudo: /etc/sudoers (l'orario della modifica non corrisponde)
file di backup: sudo: /etc/sudoers (la dimensione non corrisponde)
file di backup: systemd: /etc/systemd/journald.conf (l'orario della modifica non corrisponde)
file di backup: systemd: /etc/systemd/journald.conf (la dimensione non corrisponde)
systemd: 1645 file totali, 1 file alterati
file di backup: tor: /etc/tor/torrc (l'orario della modifica non corrisponde)
file di backup: tor: /etc/tor/torrc (la dimensione non corrisponde)
Most of them are configuration files in /etc which do not match the default ones.
This is mount output:
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sys on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
dev on /dev type devtmpfs (rw,nosuid,relatime,size=4042916k,nr_inodes=1010729,mode=755)
run on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
/dev/sda2 on / type ext4 (rw,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=2649)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
/dev/sda1 on /media/chakra type ext4 (rw,nosuid,nodev,relatime,stripe=32659,user)
/dev/sda3 on /media/giochi type ext3 (rw,nosuid,nodev,relatime,user)
/dev/sdb3 on /media/part-b type ext3 (rw,nosuid,nodev,relatime,user)
/dev/sdc1 on /media/backup type ext3 (rw,nosuid,nodev,relatime,user)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=810052k,mode=700,uid=1000,gid=100)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)
Last edited by bugandy (2020-01-18 19:03:33)
Offline