组网概述:XX数据中心网管服务器下挂在CE6810接入交换机下,CE6810交换机两台使用iStack技术做虚拟化,服务器与交换机使用双网卡与CE6810交换机双上行跨设备链路聚合,规划中与服务器链路聚合对接全部使用LACP协议

故障现象:对交换机进行巡检时提示有MAC地址漂移,在对交换机MAC地址漂移进行故障处理时发现使用reset mac-address flapping清除漂移记录时无法清除掉。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<CE6810-4>dis mac-address flapping 

MAC Address Flapping Configurations :

Flapping detection : Enable
Aging time(s) : 300
Quit-VLAN Recover time(m) : --
Exclude VLAN-list : --
Security level : Middle

S : start time E : end time (D) : error down

Time VLAN MAC Address Original-Port Move-Ports MoveNum

S:2016-09-22 20:37:14 51 ac61-75a5-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:03

Total items on slot 1: 1

Time VLAN MAC Address Original-Port Move-Ports MoveNum

S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:03

Total items on slot 2: 1
<CE6810-4>reset mac-address flapping record
<CE6810-4>dis mac-address flapping
MAC Address Flapping Configurations :

Flapping detection : Enable
Aging time(s) : 300
Quit-VLAN Recover time(m) : --
Exclude VLAN-list : --
Security level : Middle

S : start time E : end time (D) : error down

Time VLAN MAC Address Original-Port Move-Ports MoveNum

S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:23
Total items on slot 1: 1
-------------------------------------------------------------------------------
Time VLAN MAC Address Original-Port Move-Ports MoveNum
-------------------------------------------------------------------------------
S:2016-09-22 20:37:14 51 ac61-75a5-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:23
-------------------------------------------------------------------------------
Total items on slot 2: 1
<CE6810-4>

告警信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Jun  8 2018 17:52:38 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.

Jun 8 2018 17:52:38 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.

Jun 8 2018 17:36:41 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.

Jun 8 2018 17:36:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.

Jun 8 2018 17:20:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.

Jun 8 2018 17:20:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.

Jun 8 2018 17:04:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.

Jun 8 2018 17:04:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.

Jun 8 2018 16:48:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.

Jun 8 2018 16:48:39 CE6810-4 %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.

处理过程

首先对MAC地址漂移问题进行确认

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  <CE6810-4)>dis mac-address flapping 
MAC Address Flapping Configurations :
-------------------------------------------------------------------------------
Flapping detection : Enable
Aging time(s) : 300
Quit-VLAN Recover time(m) : --
Exclude VLAN-list : --
Security level : Middle
-------------------------------------------------------------------------------
S : start time E : end time (D) : error down
-------------------------------------------------------------------------------
Time VLAN MAC Address Original-Port Move-Ports MoveNum
-------------------------------------------------------------------------------
S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:03
-------------------------------------------------------------------------------
Total items on slot 1: 1
-------------------------------------------------------------------------------
Time VLAN MAC Address Original-Port Move-Ports MoveNum
-------------------------------------------------------------------------------
S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:03
-------------------------------------------------------------------------------
Total items on slot 2: 1

对MAC地址漂移进行清除,并再次查看是否存在MAC地址漂移,发现未能清除MAC地址漂移记录,同时发现MAC地址漂移数量为最大值65535次。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<CE6810-4>reset mac-address flapping record 
<CE6810-4>dis mac-address flapping
MAC Address Flapping Configurations :
-------------------------------------------------------------------------------
Flapping detection : Enable
Aging time(s) : 300
Quit-VLAN Recover time(m) : --
Exclude VLAN-list : --
Security level : Middle
-------------------------------------------------------------------------------
S : start time E : end time (D) : error down
-------------------------------------------------------------------------------
Time VLAN MAC Address Original-Port Move-Ports MoveNum
-------------------------------------------------------------------------------
S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:28

-------------------------------------------------------------------------------
Total items on slot 1: 1

-------------------------------------------------------------------------------
Time VLAN MAC Address Original-Port Move-Ports MoveNum
-------------------------------------------------------------------------------
S:2016-09-22 20:37:14 51 ac61-xxxx-ee61 10GE2/0/21 10GE1/0/21 65535
E:2018-06-08 18:06:28

-------------------------------------------------------------------------------
Total items on slot 2: 1

使用display logbuffer查看设备告警日志,发现设备实时有大量告警日志打印输出。

1
2
3
4
5
6
7
8
9
10
11
12
13
<CE6810-4>dis logbuffer 
Logging buffer configuration and contents : enabled
Allowed max buffer size : 10240
Actual buffer size : 512
Channel number : 4 , Channel name : logbuffer
Dropped messages : 0
Overwritten messages : 125749
Current messages : 512

Jun 8 2018 17:52:38 CE6810-4(7&8) %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.
Jun 8 2018 17:52:38 CE6810-4(7&8) %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.
Jun 8 2018 17:36:41 CE6810-4(7&8) %%01FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f044b-alarmID=0x095e0012;MAC flapping detected, VlanId = 51, Original-Port = 10GE2/0/21, Flapping port 1 = 10GE1/0/21, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac61-75a5-ee61.
Jun 8 2018 17:36:39 CE6810-4(7&8) %%01FEI_COMM/4/hwMflpVlanLoopAlarm_clear(l):CID=0x807f044b-alarmID=0x095e0012-clearType=service_resume;Mac flapping detection recovered in vlan 51.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20



### 因MAC地址漂移的源和目的端口为同一个链路聚合组的物理成员接口,查看Eth-Trunk接口绑定情况,发现Eth-Trunk接口为Down,但成员物理接口为UP,怀疑与服务器LACP对接不成功。
[~CE6810-4-Eth-Trunk2]dis int b
PHY: Physical
*down: administratively down
^down: standby
(l): loopback
(s): spoofing
(b): BFD down
(e): ETHOAM down
(d): Dampening Suppressed
(p): port alarm down
(dl): DLDP down
InUti/OutUti: input utility rate/output utility rate
Interface PHY Protocol InUti OutUti inErrors outErrors
Eth-Trunk2 down down 0% 0% 0 0
10GE1/0/21 up up 0.01% 0.02% 0 0
10GE2/0/21 up up 0.01% 0.03% 0 0

再次查看Eth-trunk接口配置并查看Eth-Trunk接口LACP协议状态,发现成员状态为Indep,Indep:表示处于此状态的成员口可以转发数据,但是未收到对端的LACP协议报文,怀疑服务器未发送LACP协议报文导致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
interface Eth-Trunk2
description To-eSight-2
port default vlan 51
netstream inbound ethernet
netstream outbound ethernet
mode lacp-dynamic

[~CE6810-4-Eth-Trunk2]dis eth-trunk 2
Eth-Trunk2's state information is:
Local:
LAG ID: 2 Working Mode: Dynamic
Preempt Delay: Disabled Hash Arithmetic: profile default
System Priority: 32768 System ID: 6008-1096-a421
Least Active-linknumber: 1 Max Active-linknumber: 16
Operating Status: down Number Of Up Ports In Trunk: 0
Timeout Period: Slow
--------------------------------------------------------------------------------
ActorPortName Status PortType PortPri PortNo PortKey PortState Weight
10GE1/0/21 Indep 1GE 32768 10 561 10100010 1
10GE2/0/21 Indep 1GE 32768 11 561 10100010 1

Partner:
--------------------------------------------------------------------------------
ActorPortName SysPri SystemID PortPri PortNo PortKey PortState
10GE1/0/21 0 0000-0000-0000 0 0 0 10100011
10GE2/0/21 0 0000-0000-0000 0 0 0 10100011

[~CE6810-4-Eth-Trunk2]

登录服务器查看服务器链路聚合绑定配置,发现服务器链路聚合类型非LACP,绑定类型为FEC/GEC。

将服务器端链路聚合类型修改为LACP。

再次查看Eth-Trunk2 LACP接口状态为Selected。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[~CE6810-4-Eth-Trunk1]dis eth 2
Eth-Trunk2's state information is:
Local:
LAG ID: 2 Working Mode: Dynamic
Preempt Delay: Disabled Hash Arithmetic: profile default
System Priority: 32768 System ID: 6008-1096-a421
Least Active-linknumber: 1 Max Active-linknumber: 16
Operating Status: up Number Of Up Ports In Trunk: 2
Timeout Period: Slow
--------------------------------------------------------------------------------
ActorPortName Status PortType PortPri PortNo PortKey PortState Weight
10GE1/0/21 Selected 1GE 32768 10 561 10111100 1
10GE2/0/21 Selected 1GE 32768 11 561 10111100 1

Partner:
--------------------------------------------------------------------------------
ActorPortName SysPri SystemID PortPri PortNo PortKey PortState
10GE1/0/21 1 ac61-75a5-ee61 1 9 1 10111100
10GE2/0/21 1 ac61-75a5-ee61 1 10 1 10111100

再次对MAC漂移记录进行清除并查看清除后的结果,未发现MAC漂移记录。

1
2
3
4
5
6
7
8
9
10
<CE6810-4>reset mac-address flapping record 
<CE6810-4>dis mac-ad flapping
MAC Address Flapping Configurations :
-------------------------------------------------------------------------------
Flapping detection : Enable
Aging time(s) : 300
Quit-VLAN Recover time(m) : --
Exclude VLAN-list : --
Security level : Middle
-------------------------------------------------------------------------------

根因

服务器端链路聚合模式与交换机端链路聚合模式不一致。

解决方案

将服务器端链路聚合模式修改为LACP模式

建议与总结

建议在处理故障时要先发散思维,不光要从交换机去查找原因,涉及对接的服务器也要查看状态查找原因,然后逐步将故障范围缩小并定位到某个点。