You are not logged in.

#1 2022-07-25 00:31:12

buzuddha
Member
Registered: 2020-10-02
Posts: 72

New timeouts on home media/file server after power outage

Hello All!

I'm experiencing weird timeouts with my home server after a power outage. My arch server had been rock solid for many moons prior. It could serve out media via emby virtually instantly @ 4K with no buffering and I could read and write to it @ close to 100 MB/s. A recent storm knocked out the power in my area and I've now not been able to get this machine working properly since!

Symptoms:
media service - video plays for a bit then lags then starts up again then fails completely
emby - using the browser to get to the emby server times out frequently
ssh - I can get in, but the pipe will break and I'll be booted before I can issue more than a command or two
samba - usually times out before I can get too many folders deep
ping - still good, less than 2ms to and from the server to other wired nodes. Pinging 8.8.8.8 is ~9ms
iperf3 - no output from client nor server side
pacman - lots of stalling while downloading packages (should be very fast 5800x and 800Mbit/s internet). Stuff will download fast then stop for a bit then start up again.

I'd love some help if anyone is willing to point me in the right direction.

The server is:
Arch 5.18.14
Wired connection
Files are served via samba (raid and zfs pools appear clean and online)

I did notice that smb.service shows "unable to open new log file"

I've tried unplugging and replugging the LAN connection at the back of the computer. I've tried restarting services. I've tried updating all packages and the kernel. None of this has helped. Lots of free space on the root drive. I can't figure out what's going on here.

I see one issue with smb.service but I'm not sure if this is related. Found this but I'm not sure it applies.

Thanks for any pointers and all the time!!!

Last edited by buzuddha (2022-07-25 00:32:01)

Offline

#2 2022-07-25 00:49:01

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,739

Re: New timeouts on home media/file server after power outage

What is your client?   Have you other client? If you have multiple clients, do they all suffer the same issues?

As an opener, what are the output of lsblk -f and du on your server?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#3 2022-07-25 01:17:25

buzuddha
Member
Registered: 2020-10-02
Posts: 72

Re: New timeouts on home media/file server after power outage

Thanks for your reply!

Confirmed clients where I've experienced this:

macbook pro - all issues
mac pro VM - all issues
gnome arch linux bare metal - all issues
kodi on nvidia shield - video issues

lsblk -f output
du command output

Last edited by buzuddha (2022-07-25 01:18:46)

Offline

#4 2022-07-25 01:28:04

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,739

Re: New timeouts on home media/file server after power outage

I really meant df, but no matter.  Things look rational as to disk usage and tree structure.

Is the server fully up to date?
Could you post the output of find /etc/systemd/system/

I am still fishing for hints.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#5 2022-07-25 01:48:45

buzuddha
Member
Registered: 2020-10-02
Posts: 72

Re: New timeouts on home media/file server after power outage

The server is fully up to date ("there is nothing to do").

Sorry, yes you can see the disk usage in the lsblk -f output. For kicks here's the df -h output. Fairly minimal disk occupancy on the root partition.

Here's the find output

Offline

#6 2022-07-25 06:44:00

seth
Member
Registered: 2012-09-03
Posts: 49,943

Re: New timeouts on home media/file server after power outage

From circumstance and symptoms, I'd check the dmesg/journal for IO errors and certainly "smartctl -a", https://wiki.archlinux.org/title/Smart

iperf3 - no output from client nor server side

???
Try -V and on the server also -Z

Online

#7 2022-07-26 23:26:21

buzuddha
Member
Registered: 2020-10-02
Posts: 72

Re: New timeouts on home media/file server after power outage

Thanks for your reply Seth!

server side iperf3 output (or lack thereof)
the command

iperf3 -s -Z

came back with the message

iperf: option requires an argument -- Z

not totally sure what to do with this one...this option appears to be some kind of zerocopy but the manpage doesn't really explain what that is/means.

iperf3 client side verbose output

smartctl -a /dev/nvme0n1

shows that it passed
I don't really know what normal numbers are for these outputs. Looks like temp sensor 2 is way high, but maybe that's normal??

Hmmm...the output for lm_sensors seems very weird...should the high = be 65000C?

Last edited by buzuddha (2022-07-26 23:38:10)

Offline

#8 2022-07-27 07:00:19

seth
Member
Registered: 2012-09-03
Posts: 49,943

Re: New timeouts on home media/file server after power outage

"iperf: option requires an argument -- Z" is probably some iperf 2.x implementation on the server? (where -Z would be "-Z, --linux-congestion <algo>  set TCP congestion control algorithm (Linux only)")
idk whether you can run iperf3 against iperf2 but there's no data transfer (measured).

Edit: did you check for IO errors?

Last edited by seth (2022-07-27 07:00:49)

Online

#9 2022-07-28 13:09:26

buzuddha
Member
Registered: 2020-10-02
Posts: 72

Re: New timeouts on home media/file server after power outage

Hi Seth,

The whole output for smartctl -a is linked in the my post above. Perhaps the notable part of this output was

Error Information Log Entries:      162
Warning  Comp. Temperature Time:    101640
Critical Comp. Temperature Time:    1676
Temperature Sensor 1:               60 Celsius
Temperature Sensor 2:               108 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        162     0  0x9009  0x4004  0x028 10378826846316789759     0     -
  1        161     0  0x0009  0x4005  0x028 2533322035036159     0     -

I don't really yet understand how to interpret the smartctl output but it looks like there are 162 errors. I don't know if this is a lot or a little.

For iperf3, I'm not exactly sure what's up. Server size

iperf3 -s -Z

gives the man page. I did try

iperf3 -s

alone and got output from a client

iperf3 -c 192.168.192.151 -V
iperf 3.11
Linux spaceship 5.18.14-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Jul 2022 11:46:17 +0000 x86_64
Control connection MSS 1448
Time: Thu, 28 Jul 2022 13:10:32 GMT
Connecting to host 192.168.192.151, port 5201
      Cookie: mponekttg26skheiwtwyx26wwh2d3ppmlzqr
      TCP MSS: 1448 (default)
[  5] local 192.168.192.89 port 36350 connected to 192.168.192.151 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   959 Mbits/sec    0    464 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   7.00-8.00   sec   111 MBytes   934 Mbits/sec    0    486 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec    0    486 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver
CPU Utilization: local/sender 1.1% (0.0%u/1.1%s), remote/receiver 32.1% (3.0%u/29.0%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

iperf Done.

If I try with the zerocopy option

iperf3 -c 192.168.192.151 -Z
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
iperf3: interrupt - the client has terminated

Last edited by buzuddha (2022-07-28 13:20:56)

Offline

#10 2022-07-28 16:26:38

seth
Member
Registered: 2012-09-03
Posts: 49,943

Re: New timeouts on home media/file server after power outage

The whole output for smartctl -a is linked in the my post above

smartctl isn't the same as IO errors - most IO errors (totally objective personal perception…) occur on the bus.
You want to look at dmesg - esp. since iperf3 seems to operate at ~1GBitps

You can also benchmark the critical drive to see whether it's prone to become the bottleneck, https://wiki.archlinux.org/title/Benchmarking#dd

Online

#11 2022-08-17 03:01:45

buzuddha
Member
Registered: 2020-10-02
Posts: 72

Re: New timeouts on home media/file server after power outage

ok, sorry, been away from this, but I'm picking it back up.

I benchmarked the drive and it appears to be pretty speedy for an old gen3 NVMe drive. Here are the dd commands issued.

I didn't find any I/O errors from dmesg about the drive.

dmesg | grep error
dmesg | grep input
dmesg | grep IO
dmesg | grep 'I/O'

Last edited by buzuddha (2022-08-17 03:02:58)

Offline

Board footer

Powered by FluxBB