You are not logged in.
Hello All!
I'm experiencing weird timeouts with my home server after a power outage. My arch server had been rock solid for many moons prior. It could serve out media via emby virtually instantly @ 4K with no buffering and I could read and write to it @ close to 100 MB/s. A recent storm knocked out the power in my area and I've now not been able to get this machine working properly since!
Symptoms:
media service - video plays for a bit then lags then starts up again then fails completely
emby - using the browser to get to the emby server times out frequently
ssh - I can get in, but the pipe will break and I'll be booted before I can issue more than a command or two
samba - usually times out before I can get too many folders deep
ping - still good, less than 2ms to and from the server to other wired nodes. Pinging 8.8.8.8 is ~9ms
iperf3 - no output from client nor server side
pacman - lots of stalling while downloading packages (should be very fast 5800x and 800Mbit/s internet). Stuff will download fast then stop for a bit then start up again.
I'd love some help if anyone is willing to point me in the right direction.
The server is:
Arch 5.18.14
Wired connection
Files are served via samba (raid and zfs pools appear clean and online)
I did notice that smb.service shows "unable to open new log file"
I've tried unplugging and replugging the LAN connection at the back of the computer. I've tried restarting services. I've tried updating all packages and the kernel. None of this has helped. Lots of free space on the root drive. I can't figure out what's going on here.
I see one issue with smb.service but I'm not sure if this is related. Found this but I'm not sure it applies.
Thanks for any pointers and all the time!!!
Last edited by buzuddha (2022-07-25 00:32:01)
Offline
What is your client? Have you other client? If you have multiple clients, do they all suffer the same issues?
As an opener, what are the output of lsblk -f and du on your server?
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Thanks for your reply!
Confirmed clients where I've experienced this:
macbook pro - all issues
mac pro VM - all issues
gnome arch linux bare metal - all issues
kodi on nvidia shield - video issues
lsblk -f output
du command output
Last edited by buzuddha (2022-07-25 01:18:46)
Offline
I really meant df, but no matter. Things look rational as to disk usage and tree structure.
Is the server fully up to date?
Could you post the output of find /etc/systemd/system/
I am still fishing for hints.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
The server is fully up to date ("there is nothing to do").
Sorry, yes you can see the disk usage in the lsblk -f output. For kicks here's the df -h output. Fairly minimal disk occupancy on the root partition.
Here's the find output
Offline
From circumstance and symptoms, I'd check the dmesg/journal for IO errors and certainly "smartctl -a", https://wiki.archlinux.org/title/Smart
iperf3 - no output from client nor server side
???
Try -V and on the server also -Z
Offline
Thanks for your reply Seth!
server side iperf3 output (or lack thereof)
the command
iperf3 -s -Z
came back with the message
iperf: option requires an argument -- Z
not totally sure what to do with this one...this option appears to be some kind of zerocopy but the manpage doesn't really explain what that is/means.
iperf3 client side verbose output
smartctl -a /dev/nvme0n1
shows that it passed
I don't really know what normal numbers are for these outputs. Looks like temp sensor 2 is way high, but maybe that's normal??
Hmmm...the output for lm_sensors seems very weird...should the high = be 65000C?
Last edited by buzuddha (2022-07-26 23:38:10)
Offline
"iperf: option requires an argument -- Z" is probably some iperf 2.x implementation on the server? (where -Z would be "-Z, --linux-congestion <algo> set TCP congestion control algorithm (Linux only)")
idk whether you can run iperf3 against iperf2 but there's no data transfer (measured).
Edit: did you check for IO errors?
Last edited by seth (2022-07-27 07:00:49)
Offline
Hi Seth,
The whole output for smartctl -a is linked in the my post above. Perhaps the notable part of this output was
Error Information Log Entries: 162
Warning Comp. Temperature Time: 101640
Critical Comp. Temperature Time: 1676
Temperature Sensor 1: 60 Celsius
Temperature Sensor 2: 108 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 162 0 0x9009 0x4004 0x028 10378826846316789759 0 -
1 161 0 0x0009 0x4005 0x028 2533322035036159 0 -
I don't really yet understand how to interpret the smartctl output but it looks like there are 162 errors. I don't know if this is a lot or a little.
For iperf3, I'm not exactly sure what's up. Server size
iperf3 -s -Z
gives the man page. I did try
iperf3 -s
alone and got output from a client
iperf3 -c 192.168.192.151 -V
iperf 3.11
Linux spaceship 5.18.14-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Jul 2022 11:46:17 +0000 x86_64
Control connection MSS 1448
Time: Thu, 28 Jul 2022 13:10:32 GMT
Connecting to host 192.168.192.151, port 5201
Cookie: mponekttg26skheiwtwyx26wwh2d3ppmlzqr
TCP MSS: 1448 (default)
[ 5] local 192.168.192.89 port 36350 connected to 192.168.192.151 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 464 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 934 Mbits/sec 0 486 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 486 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec receiver
CPU Utilization: local/sender 1.1% (0.0%u/1.1%s), remote/receiver 32.1% (3.0%u/29.0%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic
iperf Done.
If I try with the zerocopy option
iperf3 -c 192.168.192.151 -Z
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
iperf3: interrupt - the client has terminated
Last edited by buzuddha (2022-07-28 13:20:56)
Offline
The whole output for smartctl -a is linked in the my post above
smartctl isn't the same as IO errors - most IO errors (totally objective personal perception…) occur on the bus.
You want to look at dmesg - esp. since iperf3 seems to operate at ~1GBitps
You can also benchmark the critical drive to see whether it's prone to become the bottleneck, https://wiki.archlinux.org/title/Benchmarking#dd
Offline
ok, sorry, been away from this, but I'm picking it back up.
I benchmarked the drive and it appears to be pretty speedy for an old gen3 NVMe drive. Here are the dd commands issued.
I didn't find any I/O errors from dmesg about the drive.
dmesg | grep error
dmesg | grep input
dmesg | grep IO
dmesg | grep 'I/O'
Last edited by buzuddha (2022-08-17 03:02:58)
Offline