You are not logged in.

#1 2019-01-26 21:34:24

avi9526
Member
Registered: 2015-05-15
Posts: 116

diagnose dynamic trunking LACP between networkd and switch

I have PC with multiple VLANs over 2 ethernet cards (1GB/s each) bonded with LACP and connected to 2 ports of the smart switch DGS-1210-28 (router-on-a-stick configuration). PC have ntop-ng running so I can monitor traffic. It's working. Yet I have problem that ALL traffic comes through single port. Disconnecting this port breaks connection. Which shouldn't happen with LACP trunking. Unfortunately switch is old firmware and not showing link-aggregation status. And I can't find how to check LACP status (port states, connected device info, etc) in networkd. How to diagnose such problems in arch?

networkd configs:

enp-any.network 

[Match]
Name=enp*

[Network]
Bond=Trunk0
Trunk0.netdev 

[NetDev]
Name=Trunk0
Kind=bond

[Bond]
Mode=802.3ad
TransmitHashPolicy=layer3+4
MIIMonitorSec=1s
LACPTransmitRate=slow
Trunk0.network 

[Match]
Name=Trunk0

[Network]
VLAN=***
VLAN=***
VLAN=***
LinkLocalAddressing=no
BindCarrier=enp2s0 enp7s0

networkctl shows status "configuring" for carriers which is weird

networkctl 
IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 enp2s0           ether              carrier     configuring
  3 enp7s0           ether              carrier     configuring
  4 Trunk0           bond               carrier     configured
  5 ***             vlan               routable    configured
  6 ***             vlan               routable    configured
  7 ***             vlan               routable    configured

Offline

#2 2019-01-27 02:26:51

avi9526
Member
Registered: 2015-05-15
Posts: 116

Re: diagnose dynamic trunking LACP between networkd and switch

It appears that bond state could be checked by

cat /proc/net/bonding/*

which shows problem:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: **:**:**:**:**:ad
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 9
        Partner Key: 3
        Partner Mac Address: **:**:**:**:**:6e

Slave Interface: enp7s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: **:**:**:**:**:67
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: **:**:**:**:**:ad
    port key: 0
    port priority: 255
    port number: 1
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: enp2s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: **:**:**:**:**:ef
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: **:**:**:**:**:ad
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: **:**:**:**:**:6e
    oper key: 3
    port priority: 128
    port number: 15
    port state: 61

one of the driven NICs is in "churned" state. Which seems to be the result of that NIC to have "Aggregator ID" different from bond which breaks aggregation for this NIC. And I don't understand why this happening.

Last edited by avi9526 (2019-01-27 02:28:33)

Offline

#3 2019-03-02 21:40:38

avi9526
Member
Registered: 2015-05-15
Posts: 116

Re: diagnose dynamic trunking LACP between networkd and switch

I noticed that this problem happens if I restart machine with kexec (LTS kernel), if machine gets full restart with complete power off - everything is ok, all interfaces have aggregator ID=1. Setting churned link to down state before kexec does not help

Last edited by avi9526 (2019-03-02 22:00:55)

Offline

Board footer

Powered by FluxBB