You are not logged in.
Pages: 1
Hi
When I install nvidia GPU's in my machine, AER errors keep coming in, like this
nvidia 0000:01:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000I tried
• Bios versions 1.1, 3.5 (latest)
• Sveral PCI slots
• Several riser cables
• Several GPU's (Inno3d RTX 4090, MSI RTX3090, Gigabyte RTX3090)
What could be the cause? And what can I try
My system
Linux 6.3.9-arch1-1 x86_64
Asrock rack ROMED8-2T
AMD EPYC 7402P
Samsung 980PRO nvmelspci -tv | head
-+-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
| +-01.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-01.1-[01]--+-00.0 NVIDIA Corporation AD102 [GeForce RTX 4090]
| | \-00.1 NVIDIA Corporation AD102 High Definition Audio Controller
| +-02.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-03.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-03.5-[02]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
| +-04.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-05.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-07.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge```Error message
> dmesg
[ 1170.658532] {83}[Hardware Error]: bridge: secondary_status: 0xc000, control: 0x0000 [0/1849]
[ 1170.658533] {83}[Hardware Error]: Error 1, type: corrected
[ 1170.658534] {83}[Hardware Error]: section_type: PCIe error
[ 1170.658534] {83}[Hardware Error]: port_type: 0, PCIe end point
[ 1170.658535] {83}[Hardware Error]: version: 0.2
[ 1170.658536] {83}[Hardware Error]: command: 0x0006, status: 0x0010
[ 1170.658537] {83}[Hardware Error]: device_id: 0000:01:00.1
[ 1170.658537] {83}[Hardware Error]: slot: 0
[ 1170.658538] {83}[Hardware Error]: secondary_bus: 0x00
[ 1170.658539] {83}[Hardware Error]: vendor_id: 0x10de, device_id: 0x22ba
[ 1170.658539] {83}[Hardware Error]: class_code: 040300
[ 1170.658540] {83}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
[ 1170.658541] {83}[Hardware Error]: Error 2, type: corrected
[ 1170.658541] {83}[Hardware Error]: section_type: PCIe error
[ 1170.658542] {83}[Hardware Error]: port_type: 1, legacy PCI end point
[ 1170.658543] {83}[Hardware Error]: version: 0.2
[ 1170.658543] {83}[Hardware Error]: command: 0x0407, status: 0x0010
[ 1170.658544] {83}[Hardware Error]: device_id: 0000:01:00.0
[ 1170.658545] {83}[Hardware Error]: slot: 0
[ 1170.658546] {83}[Hardware Error]: secondary_bus: 0x00
[ 1170.658547] {83}[Hardware Error]: vendor_id: 0x10de, device_id: 0x2684
[ 1170.658548] {83}[Hardware Error]: class_code: 030000
[ 1170.658549] {83}[Hardware Error]: bridge: secondary_status: 0xc000, control: 0x0000
[ 1170.658550] {83}[Hardware Error]: Error 3, type: corrected
[ 1170.658550] {83}[Hardware Error]: section_type: PCIe error
[ 1170.658551] {83}[Hardware Error]: port_type: 0, PCIe end point
[ 1170.658552] {83}[Hardware Error]: version: 0.2
[ 1170.658553] {83}[Hardware Error]: command: 0x0006, status: 0x0010
[ 1170.658553] {83}[Hardware Error]: device_id: 0000:01:00.1
[ 1170.658554] {83}[Hardware Error]: slot: 0
[ 1170.658555] {83}[Hardware Error]: secondary_bus: 0x00
[ 1170.658556] {83}[Hardware Error]: vendor_id: 0x10de, device_id: 0x22ba
[ 1170.658556] {83}[Hardware Error]: class_code: 040300
[ 1170.658557] {83}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
[ 1170.658695] nvidia 0000:01:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[ 1170.658736] nvidia 0000:01:00.0: [ 0] RxErr (First)
[ 1170.658738] nvidia 0000:01:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[ 1170.658853] snd_hda_intel 0000:01:00.1: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[ 1170.658877] snd_hda_intel 0000:01:00.1: [ 0] RxErr (First)
[ 1170.658879] snd_hda_intel 0000:01:00.1: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[ 1170.658947] nvidia 0000:01:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[ 1170.658964] nvidia 0000:01:00.0: [ 0] RxErr (First)
[ 1170.658965] nvidia 0000:01:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[ 1170.659081] snd_hda_intel 0000:01:00.1: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[ 1170.659097] snd_hda_intel 0000:01:00.1: [ 0] RxErr (First)
[ 1170.659098] snd_hda_intel 0000:01:00.1: AER: aer_layer=Physical Layer, aer_agent=Receiver IDHardware details
> inxi -F
System:
Host: *** Kernel: 6.3.9-arch1-1 arch: x86_64 bits: 64 Console: pty pts/5 (vt 1)
Distro: Arch Linux
Machine:
Type: Server Mobo: ASRockRack model: ROMED8-2T serial: <superuser required>
UEFI: American Megatrends v: P3.50 date: 07/19/2022
CPU:
Info: 24-core model: AMD EPYC 7402P bits: 64 type: MT MCP cache: L2: 12 MiB
Speed (MHz): avg: 1572 min/max: 1500/2800 cores: 1: 2800 2: 1500 3: 1500 4: 1500 5: 1500
6: 1500 7: 1500 8: 1500 9: 1500 10: 1500 11: 1500 12: 1500 13: 1500 14: 2400 15: 1500 16: 2800
17: 1500 18: 1500 19: 1500 20: 1500 21: 1500 22: 1500 23: 1500 24: 1500 25: 1500 26: 1500
27: 1500 28: 1500 29: 1500 30: 1500 31: 1500 32: 1500 33: 1500 34: 1500 35: 1500 36: 1500
37: 1500 38: 1500 39: 1500 40: 1500 41: 1500 42: 1500 43: 1500 44: 1500 45: 1500 46: 1500
47: 1500 48: 1500
Graphics:
Device-1: NVIDIA AD102 [GeForce RTX 4090] driver: nvidia v: 535.54.03
Device-2: ASPEED Graphics Family driver: ast v: kernel
Display: server: X.org v: 1.21.1.8 driver: gpu: ast tty: 105x47 resolution: 1920x1080
API: OpenGL Message: GL data unavailable in console and glxinfo missing.
Audio:
Device-1: NVIDIA AD102 High Definition Audio driver: snd_hda_intel
API: ALSA v: k6.3.9-arch1-1 status: kernel-api
Network:
Device-1: Intel Ethernet X550 driver: ixgbe
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: d0:50:99:dc:94:69
Device-2: Intel Ethernet X550 driver: ixgbe
IF: eno2 state: up speed: 1000 Mbps duplex: full mac: d0:50:99:dc:94:6a
IF-ID-1: br-8aa902e35839 state: down mac: 02:42:22:e3:f9:3e
IF-ID-2: br-ac3821849b62 state: up speed: 10000 Mbps duplex: unknown mac: 02:42:9e:66:08:10
IF-ID-3: docker0 state: down mac: 02:42:c5:ea:bd:4a
IF-ID-4: vethcc9692c state: up speed: 10000 Mbps duplex: full mac: 0a:95:91:71:05:d1
IF-ID-5: vethfc68e9e state: up speed: 10000 Mbps duplex: full mac: 42:ef:6c:8b:06:f9
Drives:
Local Storage: total: 945.92 GiB used: 200.09 GiB (21.2%)
ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 980 PRO 1TB size: 931.51 GiB
ID-2: /dev/sda vendor: Kingston model: DataTraveler 3.0 size: 14.41 GiB type: USB
Partition:
ID-1: / size: 884.32 GiB used: 200 GiB (22.6%) fs: ext4 dev: /dev/nvme0n1p3
ID-2: /boot size: 341.3 MiB used: 89.2 MiB (26.1%) fs: vfat dev: /dev/nvme0n1p1
Swap:
ID-1: swap-1 type: partition size: 31.65 GiB used: 0 KiB (0.0%) dev: /dev/nvme0n1p2
Sensors:
Src: ipmi Permissions: Unable to run ipmi sensors. Root privileges required.
Src: lm-sensors System Temperatures: cpu: 32.5 C mobo: 37.0 C
Fan Speeds (RPM): fan-1: 0 fan-2: 0 fan-3: 0 fan-4: 0 fan-5: 0
Info:
Processes: 565 Uptime: 58m Memory: available: 314.57 GiB used: 4.02 GiB (1.3%) Init: systemd
Shell: fish inxi: 3.3.27Offline
Your motherboard comes with an ASPEED AST2500 graphics controller .
According to the manual the firmware has an option to select between the aspeed and external vga .
In firmware setup under advanced you should see OnBrd/Ext VGA Select .
What is it set to and does changing it make a difference ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Thanks for checking.
I am using the OnBrd. Switching to EXT VGA did not make a difference.
Offline
Please post full dfmesg as well as your /etc/mkinitcpio.conf file.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Pages: 1