You are not logged in.

#1 2017-06-27 10:26:13

kokoko3k
Member
Registered: 2008-11-14
Posts: 2,390

Random crashes with linux > 4.10 (i think)

Since i upgraded to some kernel > 4.10 my always-on workstation sometimes crashes during the night.
Since i need this pc to stay on as much as possible, i told him to reboot after the crash, but obviously that way i lost the ability to read the crash data.
Now i discovered and used netconsole, so that i can read what happened on another receiver, and here is the log.

Linux Gozer 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux:

giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661528] BUG: unable to handle kernel paging request at 000000000023cbd5
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661546] IP: __es_shrink+0x98/0x2c0 [ext4]
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661548] PGD 1e15e067
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661548] PUD 50634067
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661550] PMD 0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661551]
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661554] Oops: 0002 [#1] PREEMPT SMP
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661556] Modules linked in: nvidia_uvm(PO) vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) ipt_MASQUERADE
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661582]  drm syscopyarea sysfillrect sysimgblt fb_sys_fops aes_x86_64 crypto_simd glue_helper snd_seq_oss 
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661610] CPU: 2 PID: 50 Comm: kswapd0 Tainted: P  R        O    4.11.3-1-ARCH #1
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661611] Hardware name: ASUS All Series/Z97-K, BIOS 2604 05/20/2015
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661613] task: ffff880214803800 task.stack: ffffc90000ff0000
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661617] RIP: 0010:__es_shrink+0x98/0x2c0 [ext4]
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661618] RSP: 0018:ffffc90000ff3c40 EFLAGS: 00010202
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661679] RAX: ffff880013491028 RBX: ffff880208e3ebf8 RCX: 000000000023cbcd
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661680] RDX: ffff880208e3ebf8 RSI: 0000000000000001 RDI: ffff880208e3ecc0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661681] RBP: ffffc90000ff3c98 R08: 0000000000000001 R09: ffffffffa0289b00
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661682] R10: ffffc90000ff3b68 R11: ffff88021eff6000 R12: 0000000000000000
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661684] R13: ffff880013490ca8 R14: 000000000003185c R15: ffff880208e3e800
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661685] FS:  0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661746] CR2: 000000000023cbd5 CR3: 000000004ccf5000 CR4: 00000000001406e0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661747] Call Trace:
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661752]  ext4_es_scan+0xc5/0x140 [ext4]
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661755]  ? super_cache_count+0x67/0xd0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661758]  shrink_slab.part.15+0x1da/0x3f0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661761]  shrink_node+0x2f1/0x300
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661763]  kswapd+0x2cf/0x770
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661766]  kthread+0x125/0x140
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661768]  ? mem_cgroup_shrink_node+0x1c0/0x1c0
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661770]  ? kthread_create_on_node+0x70/0x70
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661772]  ret_from_fork+0x2c/0x40
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661774] Code: 0f 8e cb 00 00 00 49 8b 87 f8 03 00 00 48 39 c3 0f 84 8f 01 00 00 49 8b 87 f8 03 00 00 48 8b
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661794] RIP: __es_shrink+0x98/0x2c0 [ext4] RSP: ffffc90000ff3c40
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661795] CR2: 000000000023cbd5
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661858] ---[ end trace 9edf4fe365e2911e ]---
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661859] Kernel panic - not syncing: Fatal exception
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661863] Kernel Offset: disabled
giu 25 00:03:35 arch_ups netconsole.sh[208]: [212126.661866] Rebooting in 120 seconds..
giu 25 00:05:35 arch_ups netconsole.sh[208]: [212246.759170] ACPI MEMORY or I/O RESET_REG.

That's it, unfortunately the journal is gone, but i suspect that it happened (and usually happens) during an rsync backup versus a webdav remote system.

Since i can tell that the good old  4.8.13-1 never crashed to me, the next step will be to bisect (arch-)linux packages and then the commits, but it would take months, because i'm unable to systematically reproduce the crash, that can happen every day or every five days...

Meanwhile, any thoughts?

Thanks.


Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !

Offline

#2 2017-06-27 14:18:01

arnds
Member
Registered: 2017-06-27
Posts: 1

Re: Random crashes with linux > 4.10 (i think)

I seem to have the same problem. When i have a huge IO load system crashes and reboots randomly. I feel like this problems occure since kernel 4.11.3-1. The version before 4.11.2-1 was running for two weeks without any problems. When i copy larger files (>1G) from an external USB disc, the crash happens quiet fast after some minutes during the copy operation.

I haven't make any tests so far, just wanted to let you know that i have the same problem and it seems like it is io related.

Offline

#3 2017-06-27 14:28:04

kokoko3k
Member
Registered: 2008-11-14
Posts: 2,390

Re: Random crashes with linux > 4.10 (i think)

I don't think it is related, i can copy large files without crashing and i'm pretty sure it crashed on 4.10 several times too.
Anyway, thanks for posting, i may be wrong (testing 4.9.9 now)
-EDIT-
4.9.9 crashed as well, this time while creating a plasmoid:

*** Linux Gozer 4.9.9-2-ARCH #1 SMP PREEMPT Sat Feb 11 13:29:22 CET 2017 x86_64 GNU/Linux
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613560] BUG: unable to handle kernel paging request at 0000000000363071
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613568] IP: [<ffffffff8122c072>] __d_lookup+0x52/0x150
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613573] PGD ca7f8067 [86123.613574] PUD 0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613576]
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613579] Oops: 0000 [#1] PREEMPT SMP
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.613581] Modules linked in: nvidia_uvm(PO) vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack msr tun md4 hmac netconsole nls_utf8 cifs dns_resolver e1000 r8169 nct6775 hwmon_vid arc4 rtl8187 mac80211 cfg80211 mousedev eeprom_93cx6 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul crc32c_intel nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_realtek nvidia(PO) ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec aesni_intel eeepc_wmi asus_wmi sparse_keymap snd_hda_core aes_x86_64 iTCO_wdt iTCO_vendor_support rfkill snd_hwdep lrw gf128mul glue_helper snd_pcm ablk_helper mxm_wmi snd_timer cryptd snd evdev input_leds mii led_class mac_hid soundcore mei_me psmouse mei i2c_i801 i2c_smbus intel_cstate intel_rapl_perf lpc_ich wmi fan thermal shpchp tpm_infineon battery tpm_tis tpm_tis_core tpm video button acpi_pad fjes pci_stub ttm drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops fuse sg ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache hid_generic usbhid hid uas usb_storage serio_raw atkbd libps2 ehci_pci ehci_hcd usbcore usb_common i8042 serio nfsv3 nfs_acl nfs lockd grace sr_mod sunrpc cdrom fscache sd_mod ahci libahci libata scsi_mod [last unloaded: vboxdrv]
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614068] CPU: 3 PID: 27717 Comm: firefox Tainted: P           O    4.9.9-2-ARCH #1
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614069] Hardware name: ASUS All Series/Z97-K, BIOS 2604 05/20/2015
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614071] task: ffff880094e60e40 task.stack: ffffc90008044000
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614073] RIP: 0010:[<ffffffff8122c072>]  [<ffffffff8122c072>] __d_lookup+0x52/0x150
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614136] RSP: 0018:ffffc90008047c28  EFLAGS: 00010206
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614137] RAX: ffff880094e60e40 RBX: 0000000000363059 RCX: 000000000000000c
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614139] RDX: ffffc90000030000 RSI: ffffc90008047d50 RDI: ffff88013772d180
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614140] RBP: ffffc90008047c58 R08: ffff88013772d180 R09: ffffc90008047d40
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614141] R10: 393a59a5582d3a66 R11: 0000000a00000000 R12: 00000000c136ef8e
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614143] R13: ffff88013772d180 R14: ffffc90008047d50 R15: ffff8802131341a0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614144] FS:  00007f25a2ff9740(0000) GS:ffff88021ed80000(0000) knlGS:0000000000000000
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614147] CR2: 0000000000363071 CR3: 00000001de0fd000 CR4: 00000000001406e0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614148] Stack:
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614209]  ffffc90008047c60 ffffc90008047d40 0000000000000000 ffffc90008047cd0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614213]  ffffc90008047cc8 ffff8802131341a0 ffffc90008047cb0 ffffffff8121f174
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614217]  ffffffff8121d2f1 ffffc90008047cc4 fefefefefefefeff 000000002fb399ef
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614221] Call Trace:
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614225]  [<ffffffff8121f174>] lookup_fast+0x1b4/0x310
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614227]  [<ffffffff8121d2f1>] ? __inode_permission+0x41/0xc0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614290]  [<ffffffff8121f5b7>] walk_component+0x47/0x2a0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614292]  [<ffffffff8121ff27>] path_lookupat+0x67/0x120
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614295]  [<ffffffff81221f9d>] filename_lookup+0xad/0x140
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614297]  [<ffffffff8122f66f>] ? touch_atime+0xbf/0xd0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614299]  [<ffffffff8120ead4>] ? __check_object_size+0x54/0x1d6
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614302]  [<ffffffff813387dd>] ? strncpy_from_user+0x4d/0x170
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614304]  [<ffffffff81222106>] user_path_at_empty+0x36/0x40
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614306]  [<ffffffff81217666>] vfs_fstatat+0x66/0xc0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614308]  [<ffffffff81217b23>] SyS_newstat+0x33/0x60
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614310]  [<ffffffff812122bb>] ? vfs_read+0x11b/0x130
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614312]  [<ffffffff812137aa>] ? SyS_read+0xaa/0xc0
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614316]  [<ffffffff8160a8f7>] entry_SYSCALL_64_fastpath+0x1a/0xa9
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614317] Code: 44 8b 26 48 8b 15 df a6 8d 00 44 89 e0 d3 e8 48 8d 1c c2 e8 91 a4 eb ff 48 8b 1b 48 83 e3 fe 75 0a eb 30 48 8b 1b 48 85 db 74 28 <44> 3b 63 18 75 f2 4c 8d 7b 50 4c 89 ff e8 bc e2 3d 00 4c 39 6b
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614487] RIP  [<ffffffff8122c072>] __d_lookup+0x52/0x150
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614551]  RSP <ffffc90008047c28>
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614552] CR2: 0000000000363071
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614571] ---[ end trace a0c431c212820c92 ]---
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614573] Kernel panic - not syncing: Fatal exception
giu 30 14:33:31 arch_ups netconsole.sh[208]: [86123.614591] Kernel Offset: disabled
giu 30 14:35:07 arch_ups netconsole.sh[208]: [86123.614593] Rebooting in 120 seconds..[   41.409655] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)

I'd say still filesystem related, someone agree?

Now testing 4.9-1 hmm

Last edited by kokoko3k (2017-06-30 12:56:11)


Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !

Offline

Board footer

Powered by FluxBB