You are not logged in.

#1 2012-06-25 09:26:51

iv597
Member
From: United States
Registered: 2011-09-18
Posts: 96
Website

Various network-related applications ultimately hard-lock system

Hey, folks! I've had an issue lately that I've tried to shrug off as "botched kernel" or whatever for a while now, but it's gotten consistent enough that I simply can't ignore it. Simply put, some network-related applications will semi-randomly spike the CPU to 100% on one thread, and syslog-ng spikes with it, trying to log everything to /var/log/{errors,everything,kernel,messages}.log. Hence, both cores sit at 100%, the drive gets eaten up, and that, of course, causes issues. Ok, fine, kill the process that caused it, clear the logs, repeat, right? That worked for a while now, but as of today I'm not even getting a chance to do that: the *second* the application hits 100% CPU use and things start getting logged, the system hard-locks entirely. I have no mouse, kb, sound, net, anything. It's just dead. Obviously this is a rather critical issue, big_smile

Some applications that do this:
- rtorrent (spikes almost instantly - I got hardlocks absolutely every single time I launched it (with torrents downloading) within 5 minutes)
- transmissiond (similar story but takes longer to do. Before these spikes started killing the system I could "Pause All" in the web interface and it would stop the madness. This is what clued me into it being a net issue)
- weechat (very rare but does do it on occasion. I haven't checked the logfiles after weechat spikes so for all I know this isn't a related issue)

The chunk that gets dumped to the log files hundreds of times per second to each file (Yeah, easy to spike the CPU doing that...). This varies slightly from time to time. Obviously the offending PID and application change, but the line number and function calls in the second line (WARNING: at...) change on occasion too. Makes me wonder if the whole tcp stack isn't glitching out at this point.

Jun 25 04:40:14 localhost kernel: [  618.054892] ------------[ cut here ]------------
Jun 25 04:40:14 localhost kernel: [  618.054898] WARNING: at net/ipv4/tcp.c:1486 tcp_recvmsg+0x532/0xcf0()
Jun 25 04:40:14 localhost kernel: [  618.054901] Hardware name:  
Jun 25 04:40:14 localhost kernel: [  618.054905] recvmsg bug: copied 699F6933 seq 699F697B rcvnxt 699FB953 fl 0
Jun 25 04:40:14 localhost kernel: [  618.054908] Modules linked in: oss_usb(O) oss_ich(O) osscore(O) nvidia(PO) ppdev parport_pc acpi_cpufreq mperf i2c_core usb_storage uas keucr(C) parport rndis_host shpchp cdc_ether usbnet pci_hotplug microcode mii button psmouse fan thermal processor serio_raw evdev pcspkr sis_agp ext4 crc16 jbd2 mbcache usbhid hid sd_mod sr_mod cdrom sata_sis ohci_hcd ehci_hcd pata_acpi pata_sis ata_generic usbcore libata usb_common scsi_mod floppy
Jun 25 04:40:14 localhost kernel: [  618.054967] Pid: 1063, comm: rtorrent Tainted: P        WC O 3.4.4-1-ARCH #1
Jun 25 04:40:14 localhost kernel: [  618.054971] Call Trace:
Jun 25 04:40:14 localhost kernel: [  618.054977]  [<ffffffff810515cf>] warn_slowpath_common+0x7f/0xc0
Jun 25 04:40:14 localhost kernel: [  618.054984]  [<ffffffff810516c6>] warn_slowpath_fmt+0x46/0x50
Jun 25 04:40:14 localhost kernel: [  618.054990]  [<ffffffff813b12b2>] tcp_recvmsg+0x532/0xcf0
Jun 25 04:40:14 localhost kernel: [  618.054995]  [<ffffffff81115cd5>] ? set_page_dirty+0x65/0x70
Jun 25 04:40:14 localhost kernel: [  618.055003]  [<ffffffff813d2b81>] inet_recvmsg+0x81/0xd0
Jun 25 04:40:14 localhost kernel: [  618.055008]  [<ffffffff8135780d>] sock_recvmsg+0x11d/0x140
Jun 25 04:40:14 localhost kernel: [  618.055015]  [<ffffffff811327f5>] ? handle_pte_fault+0x95/0xab0
Jun 25 04:40:14 localhost kernel: [  618.055022]  [<ffffffff811b1871>] ? ep_send_events_proc+0x81/0x130
Jun 25 04:40:14 localhost kernel: [  618.055027]  [<ffffffff81170097>] ? fget_light+0x97/0x100
Jun 25 04:40:14 localhost kernel: [  618.055032]  [<ffffffff81359dff>] sys_recvfrom+0xef/0x170
Jun 25 04:40:14 localhost kernel: [  618.055039]  [<ffffffff8146a8e9>] system_call_fastpath+0x16/0x1b
Jun 25 04:40:14 localhost kernel: [  618.055043] ---[ end trace adb3216d27c909c2 ]---
Jun 25 04:40:14 localhost kernel: [  618.055054] ------------[ cut here ]------------

I'm using the following relevant packages... (yes, both kernels have been tested, and both do this. I'm aware you don't support -ck kernels, but as both do it, I figured I'd mention them both)

linux 3.4.4-1
linux-ck 3.4.4-1
rtorrent 0.9.2-1
(I removed transmission, so dunno. It was up to date while I used it)

And finally, I'm connected to the internet through an old Windows Mobile 6.1 phone I found laying around, I'm using it as a wireless AP for the time being til' I grab a wireless card for this desktop. This device is connected to a WiFi network, and is tethered over USB using WMWifiRouter. The kernel sees this properly as an RNDIS device and I can use it just like a standard ethernet device. This thing worked fine, with no CPU spikes at all, for months. It was only as of some time during the 3.3 kernels that it started this whole mess.


I'm aware this is a painfully long post, but I figured I'd provide as much info as I could out the gate. Have any of you had this kind of error? Should I file a bug report upstream? Thanks in advance for any help and advice!


Currently running Arch on a Samsung Chromebook Pro (dual booted with ChromeOS), and various VPSes and Docker containers.

Dotfiles on Github

Offline

#2 2012-08-09 15:08:20

Arhat
Member
Registered: 2010-07-02
Posts: 49
Website

Re: Various network-related applications ultimately hard-lock system

Hi,

I wonder,  have you found a solution for your problem?

I'm experiencing similar problems on my HP ProBook 6460b:

Aug  9 16:50:40 itnb3 kernel: [ 4271.518398] ------------[ cut here ]------------
Aug  9 16:50:40 itnb3 kernel: [ 4271.518400] WARNING: at net/ipv4/tcp.c:1486 tcp_recvmsg+0x532/0xcf0()
Aug  9 16:50:40 itnb3 kernel: [ 4271.518402] Hardware name: HP ProBook 6460b
Aug  9 16:50:40 itnb3 kernel: [ 4271.518403] recvmsg bug: copied 8FC311A8 seq 8FC31750 rcvnxt 8FC371D0 fl 0
Aug  9 16:50:40 itnb3 kernel: [ 4271.518405] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_idt uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media cdc_acm cdc_wdm cdc_ncm usbnet mii sdhci_pci joydev microcode ppdev btusb bluetooth arc4 aesni_intel aes_x86_64 aes_generic serio_raw psmouse parport_pc firewire_ohci firewire_core tpm_tis snd_hda_intel sdhci ghash_clmulni_intel crc_itu_t parport tpm_infineon mmc_core mei(C) snd_hda_codec i915 snd_hwdep e1000e snd_pcm i2c_algo_bit snd_page_alloc snd_timer snd drm_kms_helper tun soundcore drm acpi_cpufreq hp_wmi sparse_keymap mperf iTCO_wdt cryptd iTCO_vendor_support coretemp crc32c_intel i2c_core intel_agp hp_accel container video tpm intel_gtt tpm_bios lis3lv02d wmi processor evdev input_polldev ac thermal button battery brcmsmac cordic crc8 brcmutil bcma mac80211 cfg80211 rfkill ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ahci libahci libata ehci_hcd scsi_mod usbcore usb_common
Aug  9 16:50:40 itnb3 kernel: [ 4271.518459] Pid: 16570, comm: pacman Tainted: G        WC   3.4.6-1-ARCH #1
Aug  9 16:50:40 itnb3 kernel: [ 4271.518460] Call Trace:
Aug  9 16:50:40 itnb3 kernel: [ 4271.518462]  [<ffffffff810515df>] warn_slowpath_common+0x7f/0xc0
Aug  9 16:50:40 itnb3 kernel: [ 4271.518465]  [<ffffffff810516d6>] warn_slowpath_fmt+0x46/0x50
Aug  9 16:50:40 itnb3 kernel: [ 4271.518468]  [<ffffffff813b1452>] tcp_recvmsg+0x532/0xcf0
Aug  9 16:50:40 itnb3 kernel: [ 4271.518471]  [<ffffffff813d2d21>] inet_recvmsg+0x81/0xd0
Aug  9 16:50:40 itnb3 kernel: [ 4271.518474]  [<ffffffff81357b3d>] sock_recvmsg+0x11d/0x140
Aug  9 16:50:40 itnb3 kernel: [ 4271.518476]  [<ffffffff81181890>] ? __pollwait+0xf0/0xf0
Aug  9 16:50:40 itnb3 kernel: [ 4271.518479]  [<ffffffff812e19e3>] ? tty_mode_ioctl+0xb3/0x480
Aug  9 16:50:40 itnb3 kernel: [ 4271.518482]  [<ffffffff814675ae>] ? mutex_unlock+0xe/0x10
Aug  9 16:50:40 itnb3 kernel: [ 4271.518484]  [<ffffffff812dcbbd>] ? tty_ioctl+0x49d/0xb40
Aug  9 16:50:40 itnb3 kernel: [ 4271.518487]  [<ffffffff8135a12f>] sys_recvfrom+0xef/0x170
Aug  9 16:50:40 itnb3 kernel: [ 4271.518490]  [<ffffffff81181b2f>] ? poll_select_set_timeout+0x6f/0x90
Aug  9 16:50:40 itnb3 kernel: [ 4271.518492]  [<ffffffff8146ab29>] system_call_fastpath+0x16/0x1b
Aug  9 16:50:40 itnb3 kernel: [ 4271.518494] ---[ end trace 0afeb19e87dc1e38 ]---

Offline

#3 2012-08-15 07:43:43

Revelation60
Member
From: The Netherlands
Registered: 2009-03-19
Posts: 158
Website

Re: Various network-related applications ultimately hard-lock system

I have the same problem:

2012-08-12T16:25:08.006546+02:00 localhost kernel: [ 2939.104809] WARNING: at net/ipv4/tcp.c:1486 tcp_recvmsg+0x532/0xcf0()
2012-08-12T16:25:08.006547+02:00 localhost kernel: [ 2939.104811] Hardware name: To Be Filled By O.E.M.
2012-08-12T16:25:08.006548+02:00 localhost kernel: [ 2939.104813] recvmsg bug: copied 1574FBA6 seq 1574FC60 rcvnxt 1574FC61 fl 0
2012-08-12T16:25:08.006552+02:00 localhost kernel: [ 2939.104815] Modules linked in: usbhid hid nvidia(PO) snd_pcm_oss snd_mixer_oss microcode i2c_i801 iTCO_wdt r8169 i2c_core iTCO_vendor_support mii aesni_intel aes_x86_64 aes_generic ghash_clmulni_intel cryptd coretemp snd_hda_codec_hdmi acpi_cpufreq mperf xhci_hcd ehci_hcd evdev processor button mei(C) intel_agp intel_gtt crc32c_intel snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore fuse r8712u(C) usbcore usb_common nfs nfs_acl lockd auth_rpcgss sunrpc fscache ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ahci libahci libata scsi_mod
2012-08-12T16:25:08.006553+02:00 localhost kernel: [ 2939.104868] Pid: 1327, comm: Chrome_IOThread Tainted: P        WC O 3.4.8-1-ARCH #1
2012-08-12T16:25:08.006553+02:00 localhost kernel: [ 2939.104870] Call Trace:
2012-08-12T16:25:08.006554+02:00 localhost kernel: [ 2939.104872]  [<ffffffff810515bf>] warn_slowpath_common+0x7f/0xc0
2012-08-12T16:25:08.006555+02:00 localhost kernel: [ 2939.104875]  [<ffffffff810516b6>] warn_slowpath_fmt+0x46/0x50
2012-08-12T16:25:08.006556+02:00 localhost kernel: [ 2939.104878]  [<ffffffff813b1522>] tcp_recvmsg+0x532/0xcf0
2012-08-12T16:25:08.006557+02:00 localhost kernel: [ 2939.104881]  [<ffffffff81084bbd>] ? try_to_wake_up+0x1dd/0x2e0
2012-08-12T16:25:08.006558+02:00 localhost kernel: [ 2939.104884]  [<ffffffff813d2e21>] inet_recvmsg+0x81/0xd0
2012-08-12T16:25:08.006559+02:00 localhost kernel: [ 2939.104887]  [<ffffffff810135c3>] ? __switch_to+0x163/0x410
2012-08-12T16:25:08.006559+02:00 localhost kernel: [ 2939.104891]  [<ffffffff81358a82>] sock_aio_read.part.8+0x142/0x170
2012-08-12T16:25:08.006560+02:00 localhost kernel: [ 2939.104894]  [<ffffffff814688d1>] ? __schedule+0x431/0x900
2012-08-12T16:25:08.006561+02:00 localhost kernel: [ 2939.104897]  [<ffffffff81358ad5>] sock_aio_read+0x25/0x40
2012-08-12T16:25:08.006562+02:00 localhost kernel: [ 2939.104900]  [<ffffffff8116e466>] do_sync_read+0xe6/0x120
2012-08-12T16:25:08.006563+02:00 localhost kernel: [ 2939.104903]  [<ffffffff811b2613>] ? ep_scan_ready_list.isra.5+0x193/0x1a0
2012-08-12T16:25:08.006564+02:00 localhost kernel: [ 2939.104907]  [<ffffffff811f8bc4>] ? security_file_permission+0x94/0xb0
2012-08-12T16:25:08.006565+02:00 localhost kernel: [ 2939.104910]  [<ffffffff8116e931>] ? rw_verify_area+0x61/0xf0
2012-08-12T16:25:08.006566+02:00 localhost kernel: [ 2939.104912]  [<ffffffff8116ee7d>] vfs_read+0x15d/0x180
2012-08-12T16:25:08.006567+02:00 localhost kernel: [ 2939.104915]  [<ffffffff8116eeea>] sys_read+0x4a/0xa0
2012-08-12T16:25:08.006567+02:00 localhost kernel: [ 2939.104918]  [<ffffffff811b365e>] ? sys_epoll_wait+0x8e/0xf0
2012-08-12T16:25:08.006568+02:00 localhost kernel: [ 2939.104921]  [<ffffffff8146aca9>] system_call_fastpath+0x16/0x1b
2012-08-12T16:25:08.006569+02:00 localhost kernel: [ 2939.104923] ---[ end trace 4f051c2cb877c149 ]---

This message is repeated so many times that within the hour my /var/log/messages has filled up all the remaining disk space. It doesn't seem to happen always, though.

Offline

#4 2012-08-25 10:28:32

Arhat
Member
Registered: 2010-07-02
Posts: 49
Website

Re: Various network-related applications ultimately hard-lock system

Revelation60 wrote:

I have the same problem:

....

This message is repeated so many times that within the hour my /var/log/messages has filled up all the remaining disk space. It doesn't seem to happen always, though.

Yes, the same here, I accumulate a couple of GB of logs as the error is not only logged in /var/log/messages.log but also in /var/log/kernel.log, /var/log/user.log, /var/log/errors.log and /var/log/everything.log.


Also the line containing comm: xxxx Tainted:

Pid: 1327, comm: Chrome_IOThread Tainted: P        WC O 3.4.8-1-ARCH #1

varies. Sometimes its pacman, sometimes its Firefox oder Thunderbird.

I'm not yet 100% sure however it seems that in my case the error mostly crops up when I'm on a wwan connection.

Offline

#5 2012-09-23 10:04:15

Revelation60
Member
From: The Netherlands
Registered: 2009-03-19
Posts: 158
Website

Re: Various network-related applications ultimately hard-lock system

I am still having this serious issue. Maybe it's a bug in the driver. What's your device? Mine is RTL8191S WLAN Adapter.

edit: I've filed a bug report: https://bugs.archlinux.org/task/31658

Last edited by Revelation60 (2012-09-23 13:21:41)

Offline

Board footer

Powered by FluxBB