RDMA硬件管理及网络拓扑信息查看
常用工具
| 工具 | 适用场景 | 主要功能 | 备注 |
|---|---|---|---|
ibv_devinfo | InfiniBand、RoCE | 查看本节点RDMA设备详细信息,包括设备类型、固件版本、端口状态、链路层等 | 属于libibverbs-utils软件包,适用于所有RDMA设备 |
ibnetdiscover | InfiniBand | 查询InfiniBand子网完整拓扑信息,显示所有节点和交换机的连接关系 | 直接向IB子网的SM(子网管理器)查询,可在任意IB节点执行获取全网拓扑 |
lldpctl | RoCE (Ethernet) | 查看基于以太网的RoCE网络拓扑信息,显示本节点与直连交换机的连接关系 | 基于LLDP协议,只能查看单跳邻居信息,需安装lldpd服务 |
InfiniBand
PCI设备信息
查看网卡设备:lspci | grep -iE "InfiniBand|Mellanox"
29:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
3b:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
4b:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
5d:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
98:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
98:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
ab:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
bb:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
cb:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
db:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
PCI NFD标签
如果K8S集群有安装NFD组件,NFD将自动检测PCI设备并为节点打上如下标签。
feature.node.kubernetes.io/rdma.available: "true"
feature.node.kubernetes.io/rdma.capable: "true"
feature.node.kubernetes.io/pci-10de.present: "true"
feature.node.kubernetes.io/pci-10de.sriov.capable: "true"
feature.node.kubernetes.io/pci-15b3.present: "true"
feature.node.kubernetes.io/pci-15b3.sriov.capable: "true"
feature.node.kubernetes.io/pci-1a03.present: "true"
NFD标签含义说明
| 标签 | 含义 | 说明 |
|---|---|---|
feature.node.kubernetes.io/rdma.available | RDMA功能可用 | 节点上存在已启用RDMA功能的网卡设备 |
feature.node.kubernetes.io/rdma.capable | RDMA功能支持 | 节点硬件支持RDMA能力(即使当前未启用) |
feature.node.kubernetes.io/pci-10de.present | NVIDIA设备存在 | 节点上存在NVIDIA公司的PCI设备(Vendor ID: 0x10de),通常是GPU |
feature.node.kubernetes.io/pci-10de.sriov.capable | NVIDIA SR-IOV支持 | NVIDIA设备支持SR-IOV虚拟化功能 |
feature.node.kubernetes.io/pci-15b3.present | Mellanox设备存在 | 节点上存在Mellanox(现NVIDIA网络)公司的PCI设备(Vendor ID: 0x15b3),通常是InfiniBand/RoCE网卡 |
feature.node.kubernetes.io/pci-15b3.sriov.capable | Mellanox SR-IOV支持 | Mellanox网卡设备支持SR-IOV虚拟化功能 |
feature.node.kubernetes.io/pci-1a03.present | ASPEED设备存在 | 节点上存在ASPEED公司的PCI设备(Vendor ID: 0x1a03),通常是BMC管理芯片 |
RDMA设备信息
通过ibv_devinfo工具查看本节点上所有支持IB协议的所有RDMA网卡,如物理IB网卡、物理RoCE网卡:
hca_id: mlx5_0 # HCA设备ID,系统中的RDMA设备标识符
transport: InfiniBand (0) # 传输协议类型,0表示InfiniBand
fw_ver: 28.39.4082 # 固件版本号
node_guid: b8e9:2403:0045:99fe # 节点全局唯一标识符(GUID)
sys_image_guid: b8e9:2403:0045:99fe # 系统镜像GUID,通常与node_guid相同
vendor_id: 0x02c9 # 厂商ID,0x02c9代表Mellanox/NVIDIA
vendor_part_id: 4129 # 产品型号ID,4129对应ConnectX-7
hw_ver: 0x0 # 硬件版本号
board_id: MT_0000000838 # 板卡ID
phys_port_cnt: 1 # 物理端口数量
port: 1 # 端口编号
state: PORT_ACTIVE (4) # 端口状态:活跃(4),已连接可用
max_mtu: 4096 (5) # 最大传输单元,4096字节
active_mtu: 4096 (5) # 当前激活的MTU大小
sm_lid: 75 # 子网管理器的本地标识符(LID)
port_lid: 5 # 本端口的LID,用于子网内路由
port_lmc: 0x00 # LID掩码计数,用于多路径支持
link_layer: InfiniBand # 链路层协议类型
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0047:13c2
sys_image_guid: b8e9:2403:0047:13c2
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1) # 端口状态:关闭(1),未连接或禁用
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0 # SM LID为0表示未连接到子网管理器
port_lid: 65535 # LID为65535表示端口未分配有效LID
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_2
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0045:9bd6
sys_image_guid: b8e9:2403:0045:9bd6
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 75
port_lid: 4
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_3
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0047:1002
sys_image_guid: b8e9:2403:0047:1002
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_6
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0045:9b16
sys_image_guid: b8e9:2403:0045:9b16
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 75
port_lid: 10
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_7
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0047:179a
sys_image_guid: b8e9:2403:0047:179a
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_8
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0047:1762
sys_image_guid: b8e9:2403:0047:1762
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 75
port_lid: 9
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_9
transport: InfiniBand (0)
fw_ver: 28.39.4082
node_guid: b8e9:2403:0047:1ad2
sys_image_guid: b8e9:2403:0047:1ad2
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: MT_0000000838
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_bond_0 # 绑定设备,通常是多个物理网卡的bonding聚合
transport: InfiniBand (0)
fw_ver: 14.32.1010 # 较旧的固件版本(ConnectX-4)
node_guid: 94a6:d800:008a:4861
sys_image_guid: 94a6:d800:008a:4861
vendor_id: 0x02c9
vendor_part_id: 4117 # 4117对应ConnectX-4 Lx
hw_ver: 0x0
board_id: H3C0010110034 # H3C品牌的OEM网卡
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3) # 实际MTU为1500字节(以太网标准)
sm_lid: 0 # 以太网模式下无SM
port_lid: 0 # 以太网模式下无LID概念
port_lmc: 0x00
link_layer: Ethernet # 链路层为以太网,即RoCE模式
InfiniBand拓扑信息
通过ibnetdiscover工具查看IB拓扑信息,该工具直接向IB子网的SM发起查询,一次性读取SM里维护的全集群IB拓扑表,所以在任意一台IB节点执行,拿到的都是完整的全网拓扑信息。
#
# Topology file: generated on Fri Dec 26 13:52:02 2025 # 拓扑信息生成时间
#
# Initiated from node b8e9240300471762 port b8e9240300471762 # 发起查询的节点GUID
# ====== 交换机信息 ======
vendid=0x2c9 # 厂商ID,0x2c9代表Mellanox/NVIDIA
devid=0xd2f2 # 设备ID,0xd2f2代表MQM9700交换机芯片
sysimgguid=0xb0cf0e0300d66c80 # 系统镜像GUID
switchguid=0xb0cf0e0300d66c80(b0cf0e0300d66c80) # 交换机GUID
Switch 65 "S-b0cf0e0300d66c80" # Switch表示交换机节点,65为端口总数
# "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" 交换机描述信息
# enhanced port 0 表示这是增强型端口0
# lid 2 交换机的本地标识符
# lmc 0 LMC计数为0
[1] "H-b8e924030045aaae"[1](b8e924030045aaae) # 交换机端口1连接的主机
# "xxxx-hpc-37-1-ai mlx5_0" 主机名和网卡设备
# lid 71 该主机端口的LID
# 4xNDR 链路速率为NDR(400Gbps)×4通道
[2] "H-b8e924030045981e"[1](b8e924030045981e) # 交换机端口2连接到另一个主机
# "xxxx-hpc-37-1-ai mlx5_2" lid 73 4xNDR
[3] "H-b8e924030045ab1e"[1](b8e924030045ab1e) # 交换机端口3
# "xxxx-hpc-37-1-ai mlx5_8" lid 7 4xNDR
[4] "H-b8e924030045963e"[1](b8e924030045963e) # 交换机端口4
# "xxxx-hpc-37-1-ai mlx5_6" lid 8 4xNDR
[5] "H-b8e92403004599fe"[1](b8e92403004599fe) # 交换机端口5
# "xxxx-hpc-37-2-ai mlx5_0" lid 5 4xNDR
[6] "H-b8e9240300459bd6"[1](b8e9240300459bd6) # 交换机端口6
# "xxxx-hpc-37-2-ai mlx5_2" lid 4 4xNDR
[7] "H-b8e9240300459b16"[1](b8e9240300459b16) # 交换机端口7
# "xxxx-hpc-37-2-ai mlx5_6" lid 10 4xNDR
[8] "H-b8e9240300471762"[1](b8e9240300471762) # 交换机端口8
# "xxxx-hpc-37-2-ai mlx5_8" lid 9 4xNDR
[9] "H-b8e924030040652a"[1](b8e924030040652a) # 交换机端口9
# "xxxx-hpc-37-3-ai mlx5_0" lid 75 4xNDR
[10] "H-b8e924030047100a"[1](b8e924030047100a) # 交换机端口10
# "xxxx-hpc-37-3-ai mlx5_2" lid 77 4xNDR
[11] "H-b8e9240300471542"[1](b8e9240300471542) # 交换机端口11
# "xxxx-hpc-37-3-ai mlx5_6" lid 14 4xNDR
[12] "H-b8e9240300471a1a"[1](b8e9240300471a1a) # 交换机端口12
# "xxxx-hpc-37-3-ai mlx5_8" lid 13 4xNDR
[13] "H-b8e9240300459c0e"[1](b8e9240300459c0e) # 交换机端口13
# "xxxx-hpc-37-4-ai mlx5_0" lid 79 4xNDR
[14] "H-b8e9240300459a06"[1](b8e9240300459a06) # 交换机端口14
# "xxxx-hpc-37-4-ai mlx5_2" lid 81 4xNDR
[15] "H-b8e92403004068d2"[1](b8e92403004068d2) # 交换机端口15
# "xxxx-hpc-37-4-ai mlx5_6" lid 12 4xNDR
[16] "H-b8e924030045b536"[1](b8e924030045b536) # 交换机端口16
# "xxxx-hpc-37-4-ai mlx5_8" lid 11 4xNDR
[65] "H-b0cf0e0300d66c88"[1](b0cf0e0300d66c88) # 交换机端口65,连接聚合节点
# "Mellanox Technologies Aggregation Node" lid 6 4xNDR
# ====== 主机通道适配器(HCA)信息 ======
vendid=0x2c9
devid=0x1021 # 设备ID,0x1021代表ConnectX-7网卡
sysimgguid=0xb8e924030045b536 # 系统镜像GUID
caguid=0xb8e924030045b536 # 通道适配器GUID
Ca 1 "H-b8e924030045b536" # Ca表示主机通道适配器,1表示有1个端口
# "xxxx-hpc-37-4-ai mlx5_8" 主机名和设备名
[1](b8e924030045b536) "S-b0cf0e0300d66c80"[16] # 该HCA的端口1连接到交换机的端口16
# lid 11 lmc 0 该端口的LID和LMC
# "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" 交换机信息
# lid 2 交换机的LID
# 4xNDR 链路速率
vendid=0x2c9
devid=0xcf09 # 设备ID,0xcf09代表聚合节点设备
sysimgguid=0xb0cf0e0300d66c80
caguid=0xb0cf0e0300d66c88
Ca 1 "H-b0cf0e0300d66c88" # 聚合节点的HCA
# "Mellanox Technologies Aggregation Node"
[1](b0cf0e0300d66c88) "S-b0cf0e0300d66c80"[65] # 连接到交换机的端口65
# lid 6 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e92403004068d2
caguid=0xb8e92403004068d2
Ca 1 "H-b8e92403004068d2" # "xxxx-hpc-37-4-ai mlx5_6"
[1](b8e92403004068d2) "S-b0cf0e0300d66c80"[15] # 连接到交换机端口15
# lid 12 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300459a06
caguid=0xb8e9240300459a06
Ca 1 "H-b8e9240300459a06" # "xxxx-hpc-37-4-ai mlx5_2"
[1](b8e9240300459a06) "S-b0cf0e0300d66c80"[14] # lid 81 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300459c0e
caguid=0xb8e9240300459c0e
Ca 1 "H-b8e9240300459c0e" # "xxxx-hpc-37-4-ai mlx5_0"
[1](b8e9240300459c0e) "S-b0cf0e0300d66c80"[13] # lid 79 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300471a1a
caguid=0xb8e9240300471a1a
Ca 1 "H-b8e9240300471a1a" # "xxxx-hpc-37-3-ai mlx5_8"
[1](b8e9240300471a1a) "S-b0cf0e0300d66c80"[12] # lid 13 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300471542
caguid=0xb8e9240300471542
Ca 1 "H-b8e9240300471542" # "xxxx-hpc-37-3-ai mlx5_6"
[1](b8e9240300471542) "S-b0cf0e0300d66c80"[11] # lid 14 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030047100a
caguid=0xb8e924030047100a
Ca 1 "H-b8e924030047100a" # "xxxx-hpc-37-3-ai mlx5_2"
[1](b8e924030047100a) "S-b0cf0e0300d66c80"[10] # lid 77 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030040652a
caguid=0xb8e924030040652a
Ca 1 "H-b8e924030040652a" # "xxxx-hpc-37-3-ai mlx5_0"
[1](b8e924030040652a) "S-b0cf0e0300d66c80"[9] # lid 75 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300459b16
caguid=0xb8e9240300459b16
Ca 1 "H-b8e9240300459b16" # "xxxx-hpc-37-2-ai mlx5_6"
[1](b8e9240300459b16) "S-b0cf0e0300d66c80"[7] # lid 10 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300459bd6
caguid=0xb8e9240300459bd6
Ca 1 "H-b8e9240300459bd6" # "xxxx-hpc-37-2-ai mlx5_2"
[1](b8e9240300459bd6) "S-b0cf0e0300d66c80"[6] # lid 4 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e92403004599fe
caguid=0xb8e92403004599fe
Ca 1 "H-b8e92403004599fe" # "xxxx-hpc-37-2-ai mlx5_0"
[1](b8e92403004599fe) "S-b0cf0e0300d66c80"[5] # lid 5 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030045963e
caguid=0xb8e924030045963e
Ca 1 "H-b8e924030045963e" # "xxxx-hpc-37-1-ai mlx5_6"
[1](b8e924030045963e) "S-b0cf0e0300d66c80"[4] # lid 8 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030045ab1e
caguid=0xb8e924030045ab1e
Ca 1 "H-b8e924030045ab1e" # "xxxx-hpc-37-1-ai mlx5_8"
[1](b8e924030045ab1e) "S-b0cf0e0300d66c80"[3] # lid 7 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030045981e
caguid=0xb8e924030045981e
Ca 1 "H-b8e924030045981e" # "xxxx-hpc-37-1-ai mlx5_2"
[1](b8e924030045981e) "S-b0cf0e0300d66c80"[2] # lid 73 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e924030045aaae
caguid=0xb8e924030045aaae
Ca 1 "H-b8e924030045aaae" # "xxxx-hpc-37-1-ai mlx5_0"
[1](b8e924030045aaae) "S-b0cf0e0300d66c80"[1] # lid 71 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
vendid=0x2c9
devid=0x1021
sysimgguid=0xb8e9240300471762
caguid=0xb8e9240300471762
Ca 1 "H-b8e9240300471762" # "xxxx-hpc-37-2-ai mlx5_8"
[1](b8e9240300471762) "S-b0cf0e0300d66c80"[8] # lid 9 lmc 0 "MF0;DCXNYD-IB03-D1F2-D207-F08-U45:MQM9700/U1" lid 2 4xNDR
RoCE
PCI设备信息
查看网卡设备:lspci | grep -iE "InfiniBand|Mellanox"
d8:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
PCI NFD标签
如果K8S集群有安装NFD组件,NFD将自动检测PCI设备并为节点打上如下标签。
feature.node.kubernetes.io/rdma.available: "true"
feature.node.kubernetes.io/rdma.capable: "true"
feature.node.kubernetes.io/pci-10de.present: "true"
feature.node.kubernetes.io/pci-15b3.present: "true"
feature.node.kubernetes.io/pci-15b3.sriov.capable: "true"
feature.node.kubernetes.io/pci-1a03.present: "true"
RDMA设备信息
通过ibv_devinfo工具查看本节点上所有IB协议的所有RDMA网卡,如物理IB网卡、物理RoCE网卡:
hca_id: rocep216s0f0 # RoCE设备ID,命名格式:roce + PCI地址 + 功能号
transport: InfiniBand (0) # 传输层仍显示InfiniBand,因为RoCE使用IB协议栈
fw_ver: 14.32.1010 # 固件版本号
node_guid: b83f:d203:00a9:cc14 # 节点全局唯一标识符
sys_image_guid: b83f:d203:00a9:cc14 # 系统镜像GUID
vendor_id: 0x02c9 # 厂商ID,0x02c9代表Mellanox/NVIDIA
vendor_part_id: 4117 # 产品型号ID,4117对应ConnectX-4 Lx
hw_ver: 0x0 # 硬件版本号
board_id: MT_2420110034 # 板卡ID
phys_port_cnt: 1 # 物理端口数量
port: 1 # 端口编号
state: PORT_ACTIVE (4) # 端口状态:活跃(4),已连接可用
max_mtu: 4096 (5) # 最大MTU支持4096字节
active_mtu: 1024 (3) # 当前MTU为1500字节(标准以太网)
sm_lid: 0 # RoCE网络无子网管理器,LID为0
port_lid: 0 # RoCE网络不使用LID寻址,使用MAC/IP
port_lmc: 0x00 # 无LMC配置
link_layer: Ethernet # 链路层为以太网,标识这是RoCE设备
为什么 RoCE 网卡信息上会显示 transport: InfiniBand (0)?
RoCE属于IB协议体系,只是底层传输介质换成了以太网,上层的协议逻辑、交互规则、接口调用,和物理IB网卡完全一模一样。
RoCE本质是「在以太网上跑InfiniBand的RDMA协议栈」IB是一套完整的RDMA协议标准,RoCE是这套标准的以太网承载实现,IB和RoCE共用同一个RDMA协议上层,只是底层的传输介质不同。
为什么通过lspci看到有2张RoCE的网卡,但是通过ibv_devinfo却只看到一个设备信息?
因为这里的硬件是一块Mellanox ConnectX-4 Lx双口物理网卡,lspci识别的是「物理PCI端口」,而ibv_devinfo识别的是「RDMA驱动层面的HCA设备」——迈络思的双口RoCE/IB网卡,在 RDMA 驱动层只会被识别为「一个 HCA 主设备」,两个物理网口是这个主设备下的「两个子端口」。
当前ibv_devinfo里只显示phys_port_cnt: 1,说明你的双口网卡,只启用了1个网口的RDMA功能,另一个网口要么物理未插线/链路Down,要么RDMA功能未启用,并不是网卡消失了。
Ethernet拓扑信息
RoCE基于以太网,而以太网可以通过LLDP协议感知拓扑,因此需要先安装LLDP拓扑感知工具:
apt install lldpd
通过lldpctl查看RoCE拓扑信息。
只能探测【当前节点】自身 ↔ 直连交换机 的单层拓扑信息(本机网卡连到哪台交换机、哪个端口、哪个VLAN),看不到其他节点的拓扑、看不到交换机之间的级联拓扑、看不到跨节点的网络关系。
-------------------------------------------------------------------------------
LLDP neighbors: # LLDP邻居设备信息
-------------------------------------------------------------------------------
Interface: ens64f0np0, via: LLDP, RID: 1, Time: 0 day, 00:00:18 # 本地接口名称,通过LLDP协议发现
Chassis: # 对端设备机箱信息
ChassisID: mac ec:cd:4c:e3:09:9c # 对端设备MAC地址(机箱ID)
SysName: DCXNYD-DMXCS-LEAF21_D1F2-D207-A09-U45-U43 # 对端交换机系统名称
SysDescr: H3C Comware Platform Software, Software Version 7.1.070, Release 6715 # 系统描述,软件版本
H3C S6850-56HF # 交换机型号
Copyright (c) 2004-2024 New H3C Technologies Co., Ltd. All rights reserved.
MgmtIP: 10.112.30.254 # 交换机管理IP地址
MgmtIface: 2153 # 管理接口ID
Capability: Bridge, on # 具备桥接能力,已启用
Capability: Router, on # 具备路由能力,已启用
Port: # 对端端口信息
PortID: ifname Twenty-FiveGigE2/0/1 # 对端端口标识,25GbE第2模块第0槽位第1端口
PortDescr: Twenty-FiveGigE2/0/1 Interface # 端口描述
TTL: 121 # 存活时间(秒),LLDP信息有效期
MFS: 9416 # 最大帧大小(字节),支持Jumbo Frame
PMD autoneg: supported: yes, enabled: yes # 物理介质相关(PMD)自动协商,已支持并启用
MAU oper type: 25GbaseSR - 25GBASE-R PCS/PMA over multimode fiber # 介质连接单元类型,25G短距多模光纤
MDI Power: supported: no, enabled: no, pair control: no # 介质相关接口供电(PoE),不支持
Device type: PD # 设备类型:受电设备(Powered Device)
Power pairs: signal # 供电线对:信号线对
Class: class 0 # 功率等级:0级(不使用PoE)
VLAN: 900, pvid: yes # VLAN ID为900,且为端口VLAN ID(原生VLAN)
-------------------------------------------------------------------------------
Interface: ens64f1np1, via: LLDP, RID: 1, Time: 0 day, 00:00:18 # 第二个网络接口
Chassis:
ChassisID: mac ec:cd:4c:e3:09:9c # 同一台交换机(相同MAC地址)
SysName: DCXNYD-DMXCS-LEAF21_D1F2-D207-A09-U45-U43
SysDescr: H3C Comware Platform Software, Software Version 7.1.070, Release 6715
H3C S6850-56HF
Copyright (c) 2004-2024 New H3C Technologies Co., Ltd. All rights reserved.
MgmtIP: 10.112.30.254
MgmtIface: 2153
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname Twenty-FiveGigE1/0/1 # 连接到交换机的另一个端口(第1模块第0槽位第1端口)
PortDescr: Twenty-FiveGigE1/0/1 Interface
TTL: 121
MFS: 9416
PMD autoneg: supported: yes, enabled: yes
MAU oper type: 25GbaseSR - 25GBASE-R PCS/PMA over multimode fiber
MDI Power: supported: no, enabled: no, pair control: no
Device type: PD
Power pairs: signal
Class: class 0
VLAN: 900, pvid: yes # 同样在VLAN 900中
-------------------------------------------------------------------------------