Network connectivity drops intermittently

Ashex · 2014-04-16 07:38:22

I've got some really weird behavior with my home file server that started early yesterday. It first presented itself with irssi being super slow (occasionally not responded) and I kept getting server disconnects. I poked around journalctl and found this:

pr 15 22:21:01 gaia.haiti systemd[1]: systemd-journald.service watchdog timeout!
Apr 15 22:22:32 gaia.haiti systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Apr 15 22:22:32 gaia.haiti systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Apr 15 22:22:32 gaia.haiti systemd[1]: Unit systemd-journald.service entered failed state.
Apr 15 22:22:32 gaia.haiti systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Apr 15 22:22:32 gaia.haiti systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Apr 15 22:22:32 gaia.haiti systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Apr 15 22:22:32 gaia.haiti systemd[1]: Unit systemd-journald.service entered failed state.
Apr 15 22:22:32 gaia.haiti systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Apr 15 22:22:32 gaia.haiti systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Apr 15 22:22:32 gaia.haiti systemd[1]: Unit systemd-journald.service entered failed state.
Apr 15 22:33:52 gaia.haiti systemd-journal[8929]: Forwarding to syslog missed 8 messages.
Apr 15 22:33:53 gaia.haiti systemd-journal[8929]: Journal stopped
Apr 15 22:33:53 gaia.haiti systemd-journal[9057]: Permanent journal is using 1.3G (max allowed 1.3G, trying to leave 11.7M free of 33.9G available &#8594; current limit 1.3G).
Apr 15 22:33:53 gaia.haiti systemd-journald[8929]: Received SIGTERM from PID 1 (systemd).
Apr 15 22:33:53 gaia.haiti systemd[1]: Unit systemd-journald.service entered failed state.
Apr 15 22:33:53 gaia.haiti systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
Apr 15 22:33:53 gaia.haiti systemd[1]: Stopping Journal Service...
Apr 15 22:33:53 gaia.haiti systemd[1]: Starting Journal Service...
Apr 15 22:33:53 gaia.haiti systemd[1]: Started Journal Service.
Apr 15 22:33:53 gaia.haiti systemd-journal[9057]: Journal started

Which looks a little strange but didn't explain the irssi behavior. On a whim I turned up the front fan on the server as I had seen a couple alerts about the disk temp hitting 40C, within a few minutes munin spewed a bunch of alerts, primarily OK alerts around disk latency which is a little strange. That said, I keep getting alerts about eth0 errors being unknown/a high number.

This morning I looked at munin and there are huge gaps in the charts, 1-2 hours which is really strange. Looking in the munin logs I see errors about connecting to the tcp port. Anyone know what's going on? I've got munin configured to use CGI for the graph generation.

While I was typing this I lost network connectivity to the server (ssh/http) for 5-10 minutes, after I got back in I saw the systemd messages above. Scrolling through my buffer I see the following error appeared regularly yesterday:

[1279012.741565] INFO: task nginx:32084 blocked for more than 120 seconds.
[1279012.741616]       Not tainted 3.13.7-1-ARCH #1
[1279012.741648] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1279012.741701] nginx           D 00000343     0 32084  32083 0x00000000
[1279012.741706]  c0f7fe7c 00200086 c0b4e930 00000343 00000000 c16aca40 2b3cbea6 00048ae4
[1279012.741712]  c16aca40 f4c7c2c0 c0faac80 c0faac80 c0f7fe40 00054878 00000000 c0f7ff14
[1279012.741717]  00000000 00000000 c0b4e930 00000000 c0bfe42c c0bfe490 000005a8 000005a8
[1279012.741722] Call Trace:
[1279012.741731]  [<c1453443>] schedule+0x23/0x60
[1279012.741736]  [<c14537c4>] schedule_preempt_disabled+0x14/0x20
[1279012.741739]  [<c1455725>] __mutex_lock_slowpath+0x105/0x330
[1279012.741742]  [<c1455960>] mutex_lock+0x10/0x1c
[1279012.741747]  [<c1104870>] generic_file_aio_write+0x40/0x90
[1279012.741763]  [<c1156467>] do_sync_write+0x57/0x90
[1279012.741767]  [<c1156410>] ? do_sync_read+0x90/0x90
[1279012.741770]  [<c1156c25>] vfs_write+0x95/0x1c0
[1279012.741773]  [<c11572f1>] SyS_write+0x51/0x90
[1279012.741777]  [<c145d40d>] sysenter_do_call+0x12/0x28

Looking on google suggests this is related to resource starvation.

Output of free -m:

             total       used       free     shared    buffers     cached
Mem:          3234       3111        123          0          3         63
-/+ buffers/cache:       3044        189
Swap:         2047       1047       1000

I do have 8GB installed but it's i686 (legacy system) and so far my attempts to build a kernel with PAE have failed.

Edit: I stopped subsonic which was occupying about 500M Virtual RAM and munin is updating more reliably.
Edit Edit: Still happened. Installed sysstat and it looks like things are still being blocked. Also shows average memory utilization at 95%

Linux 3.13.7-1-ARCH (gaia.haiti) 	04/16/2014 	_i686_	(4 CPU)

10:27:33 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
10:37:33 AM         1       408      1.63      1.74      2.13         0
10:47:33 AM         1       409      1.28      1.39      1.76         0
10:57:38 AM         0       408      2.11      1.70      1.75         4
11:07:38 AM         0       403     11.43      8.34      5.02        14
11:17:38 AM         0       421     15.65     11.50      7.96        16
11:34:57 AM         1       440     29.46     35.64     25.85         4
11:44:57 AM         0       399     16.61     18.16     20.32         3
Average:            0       413     11.17     11.21      9.26         6

Update: Looks to be directly related to Memory utilization, followed this guide on diagnosing high load and appears the system is swapping a bit. I expected it to handle things a little better since I'm using a SSD (the only hardware change in a few months).

Last edited by Ashex (2014-04-16 10:03:31)

Arch Linux

#1 2014-04-16 07:38:22

Network connectivity drops intermittently

Board footer