You are not logged in.

#1 2021-04-03 06:05:27

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Nvidia Driver causing input output error

Hello,

I have a computer with an Nvidia graphics card which used to run the proprietary nvidia driver.
Sometime ago my harddrive started experiencing data loss and unfortunately I had to recover data from it and I moved to an ssd.
I did a fresh install of arch on the ssd and ever since I did that, I've been experiencing input output errors.

Everytime I have to reboot to get my system working again.
From all the times it has happened I have recognised a pattern.
I run a program with some form of hardware acceleration (Games like minecraft, factorio)
I get an input output error.
I write the journal to the file to debug what is happening and what I see is that while these programs are running, my ssd has I/O errors for some reason. And someone on #archlinux said that after a few I/O errors on the drive, the kernel tries to mount the drive read-only.
And I saw that in the journal.
Due to this, all my partitions are mounted read-only and my system becomes unusable.

To stress the point this only happens when some kind of game is running.
If I leave my desktop running for 5 hrs nothing happens.
If I 100% my cpu for 5hrs nothing happens.
Only after a game is running for 30min-2hr in between this time period does the I/O error occur.

Peeps on #archlinux said to me to check the ssd, so I did. I ran s.m.a.r.t tests.
Everything was fine. I ran fsck everything was clean. All my data is intact too.

I updated the kernel and drivers and still the problem persists.
I switched to the nouveau driver and it everything works fine except for the fact that the nouveau driver is broken and it hangs the pc and breaks xorg sometimes.

So I'm pretty sure it is the fault of the driver.


Details:
Kernel: 5.10.24-1-lts
Nvidia-dkms: 460.67

Help?

Offline

#2 2021-04-03 09:26:04

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

I'm pretty sure it's temperature or power consumption. Or is the SSD actually an NVMe?
Also https://bbs.archlinux.org/viewtopic.php?id=57855 - "an input output error" is equivalent to "somehow doesn't work"

Offline

#3 2021-04-05 09:41:28

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

Thanks for the reply,

I don't think my ssd is an NVMe.

I got these logs after the I/O error occurred.

I switched to the Nvidia Drivers, ran a game and in about 45 minutes I got an I/O error.
No new programs could be run. I had a terminal open before, so I ran ls and it just hung.
This happens when I get an I/O error.

I got these logs:
Dmesg in follow mode: log
Journalctl in follow mode (This is only from when I ran the command which is after reaching desktop): log
Xorg: log
output of smart -a: log

I can't see any errors in these logs but that just might be me.

Offline

#4 2021-04-05 12:55:20

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       140799
…
199 UDMA_CRC_Error_Count    0x0000   100   100   000    Old_age   Offline      -       37

Do these numbers raise w/ the IO errors?

There're no IO errors at all in those logs?
What exactly are you talking about? (In doubt make a photo if you can't preserve the information otherwise, only post a link - the board has a 200x200px limit)

Offline

#5 2021-04-07 09:56:00

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

Hardware_ECC_Recovered has increased by 1 after I tested but I cannot guarantee that it was caused by the I/O error.

The I/O error logs are not written onto the disk. Hence I had to follow dmesg and journal on the screen.
After running a game for 10-20 min, I could see error messages and the game froze. I couldn't run any new command.
but I could see logs being sent to the terminal.
I couldn't copy them however due to the I/O error.

I ended up clicking photos of the errors.

Here is a tar containing all the photos. Due to how I got the photos the dmesg and journal logs got mixed up but I think it is pretty easy to tell which are which.
also the ordering is gone. Sorry.
error_log.tar

Offline

#6 2021-04-07 11:59:50

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

The drive ceases to respond, even on soft and hard resets.
Since the SMART data doesn't suggest that this is a broken drive.
There's little chance that this is a bus interference, there's also no problem recorded w/ the nvidia blob and your GPU seems fine.

My money is still on power.
Can you force the nvidia driver to operate in "desktop" mode or underclock it and see whether you can still trigger the problem?

Offline

#7 2021-04-07 13:28:43

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I don't know how to run in desktop mode but I'll try to underclock and see if that helps

Offline

#8 2021-04-07 13:47:03

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I used nvidia-settings to underclock
I decreased the clock by -105Mhz (Max)
I decreased the memory transfer rate by -500

I played xonotic and I got the same I/O error.

Offline

#9 2021-04-07 14:00:02

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

GPU GeForce GT 730 - that's a 38W possibly even passive GPU anyway… just make sure it's in the proper PCIe slot and that no cables touch the cooler…

You could try the 390xx legacy driver to narrow it down :\

https://aur.archlinux.org/packages/nvidia-390xx-dkms/
https://aur.archlinux.org/packages/nvidia-390xx-utils/

Offline

#10 2021-04-07 14:23:49

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I switched from the main branch to 390xx but it still caused the same I/O error. sad

Offline

#11 2021-04-07 14:26:38

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

Not the kernel, not the driver (version), not the disk, …

seth wrote:

make sure it's in the proper PCIe slot and that no cables touch the cooler

Offline

#12 2021-04-07 15:03:59

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I'm still on the 390xx branch.

I removed the gpu and put it back in and made sure everything is connected properly.
The gpu has a small fan and there are no cables near it blocking it.

I tried the xonotic test again and the same I/O error occured.

One thing I noticed in dmesg after boot is this:

[   16.730326] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
[   16.730692] caller _nv001015rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

Does this have anything to do with the problem?

Offline

#13 2021-04-07 15:12:54

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

https://forums.developer.nvidia.com/t/d … bars/58088
but
https://forums.developer.nvidia.com/t/n … 14/55670/8

Also everybody gets that and your problem is suspiciously reproducible…

nb. that not all PCIe slots are equal, https://en.wikipedia.org/wiki/PCI_Express
Consult your board manual about this (though this should™ crash the GPU rather than the SATA bus…)

Offline

#14 2021-04-07 15:29:15

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I read the links and there seem to be no answers to fix it.

All PCIe slots are equal when you have only one standard slot.

There is no board manual.

What should I do? sad

Offline

#15 2021-04-07 16:24:05

seth
Member
Registered: 2012-09-03
Posts: 19,805

Re: Nvidia Driver causing input output error

Do you have a second SATA port on the board?
Btw, "[    0.000000] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./G41, BIOS 080015  10/08/2016" - wtf… what board is that actually? Are there still BIOS updates available?

Offline

#16 2021-04-08 09:35:17

megamind6155
Member
Registered: 2019-01-16
Posts: 26

Re: Nvidia Driver causing input output error

I have 4 SATA ports on the board. It was plugged into 3 so I tried 1,2 & 4 but no luck.
I got the same error.

The board is an obscure cheap motherboard. Probably from china. It has a BIOS and I don't think there are any updates available.

Offline

Board footer

Powered by FluxBB