Skip to main content

前言

NodeFeatureRuleNode Feature DiscoveryNFD)最强大的功能之一,它提供了一个灵活的规则引擎,允许用户基于节点的硬件特性和系统配置定义自定义的标签规则。通过NodeFeatureRule,可以将底层的硬件特性转换为高层次的语义化标签,从而简化应用的调度配置。本文将深入介绍NodeFeatureRule的配置和使用方法。

NodeFeatureRule工作原理

NodeFeatureRule是一个自定义资源(CRD),用于定义基于节点特性的标签规则。nfd-master会监听NodeFeatureRule对象的变化,并根据规则处理nfd-worker上报的特性数据。

工作流程

工作步骤:

  1. 用户创建NodeFeatureRule对象
  2. nfd-master监听到新的规则
  3. nfd-worker定期扫描并上报节点特性
  4. nfd-master根据规则匹配特性
  5. 如果匹配成功,应用规则定义的标签、注解、扩展资源或污点
  6. 更新节点对象

基本结构

NodeFeatureRule的完整结构:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: my-sample-rule
spec:
rules:
- name: "rule description"
# 输出配置
labels:
"label-key": "label-value"
labelsTemplate: |
{{ range .feature }}...{{ end }}
annotations:
"annotation-key": "annotation-value"
extendedResources:
"resource-name": "resource-quantity"
taints:
- key: "taint-key"
value: "taint-value"
effect: NoSchedule
vars:
"var-name": "var-value"
varsTemplate: |
{{ range .feature }}...{{ end }}

# 匹配条件
matchFeatures:
- feature: <feature-name>
matchExpressions:
<attribute>: {op: <operator>, value: [<values>]}
matchName:
op: <operator>
value: [<values>]

# 或条件
matchAny:
- matchFeatures:
- feature: <feature-name>
matchExpressions:
<attribute>: {op: <operator>, value: [<values>]}

核心字段详解

name字段

规则的名称,用于标识和调试:

- name: "GPU vendor detection"

最佳实践

  • 使用描述性的名称
  • 名称应该清晰说明规则的用途
  • 避免使用特殊字符

labels字段

定义要创建的节点标签,支持静态值和动态值:

labels:
# 静态标签
"vendor.io/gpu": "true"
"hardware.type": "accelerated"

# 动态标签,使用 @ 引用特性值
"vendor.io/gpu-model": "@pci.device.device"
"kernel.version": "@kernel.version.full"

动态值语法

  • 格式:@<feature-name>.<attribute-name>
  • 示例:@cpu.model.vendor_id@kernel.version.major

限制

  • 标签键最长63字符
  • 标签值最长63字符
  • 必须符合Kubernetes标签命名规范

labelsTemplate字段

使用Go template语法动态生成多个标签:

labelsTemplate: |
{{ range .pci.device }}
pci-{{ .vendor }}-{{ .device }}.present=true
{{ end }}

常用模板函数

函数说明示例
range遍历集合{{ range .feature }}...{{ end }}
if条件判断{{ if .attribute }}...{{ end }}
len获取长度{{ .feature | len }}
first获取第一个元素{{ .feature | first }}
trim去除空白{{ .value | trim }}
substr截取子串{{ .value | substr 0 10 }}

模板数据结构

# Flag特性(只有名称)
.<feature-name>:
- Name: <flag-name>

# Attribute特性(名称+值)
.<feature-name>:
- Name: <attribute-name>
Value: <attribute-value>

# Instance特性(多个属性)
.<feature-name>:
- <attribute-1>: <value-1>
<attribute-2>: <value-2>

示例

labelsTemplate: |
{{ range .cpu.cpuid }}cpu-{{ .Name }}=true
{{ end }}
cpu-count={{ .cpu.topology | len }}

annotations字段

定义要创建的节点注解:

annotations:
"vendor.io/gpu-driver-version": "@pci.device.driver"
"hardware.info/detailed-spec": "Custom hardware configuration"

用途

  • 存储详细的硬件信息
  • 记录检测时间戳
  • 保存不适合用作标签的长文本

extendedResources字段

定义扩展资源,可以被Pod请求:

extendedResources:
# 静态数量
vendor.io/fpga: "4"

# 动态数量,引用特性值
vendor.io/gpu-memory: "@pci.device.memory"

使用场景

  • 自定义硬件资源(FPGA、专用加速器等)
  • 基于硬件特性计算的资源配额
  • 需要被Kubernetes调度器感知的资源

taints字段

定义节点污点,用于隔离特殊节点:

taints:
- key: "nvidia.com/gpu"
value: "true"
effect: NoSchedule
- key: "special-hardware"
value: "fpga"
effect: NoExecute

Effect类型

  • NoSchedule:新Pod不会调度到该节点
  • PreferNoSchedule:尽量避免调度新Pod
  • NoExecute:驱逐已有Pod(除非有容忍)

启用污点功能

# nfd-master ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: nfd-master-conf
namespace: node-feature-discovery
data:
nfd-master.conf: |
enableTaints: true

注意事项

  • 使用污点前确保nfd-worker能容忍该污点
  • 否则nfd-worker Pod会被驱逐

vars字段

定义变量,用于规则间传递信息:

- name: "detect gpu"
vars:
gpu_present: "true"
gpu_vendor: "nvidia"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}

- name: "use gpu variable"
labels:
"workload.suitable": "gpu-computing"
matchFeatures:
- feature: rule.matched
matchExpressions:
gpu_present: {op: IsTrue}

用途

  • 在规则间传递检测结果
  • 构建复杂的规则层次
  • 避免重复匹配逻辑

varsTemplate字段

使用模板动态生成多个变量:

varsTemplate: |
{{ range .pci.device }}
device-{{ .vendor }}-{{ .device }}=found
{{ end }}

matchFeatures字段

定义匹配条件,所有条件必须满足(逻辑AND):

matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
dummy: {op: Exists}
- feature: kernel.config
matchExpressions:
X86: {op: In, value: ["y"]}
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}

结构说明

  • 每个feature项是一个特性匹配器
  • 多个特性匹配器之间是AND关系
  • 同一特性匹配器内的多个表达式也是AND关系

matchExpressions语法

matchExpressions:
<attribute-name>: {op: <operator>, value: [<values>]}

matchName语法(匹配特性名称):

matchName:
op: <operator>
value: [<values>]

matchAny字段

定义或条件,只需满足一个(逻辑OR):

matchAny:
- matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
- matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX2: {op: Exists}

逻辑关系

  • matchAny列表中的每个元素是OR关系
  • 每个元素内的matchFeatures仍然是AND关系

匹配操作符

NodeFeatureRule支持丰富的匹配操作符:

操作符说明值类型示例
In值在列表中字符串列表{op: In, value: ["8086", "10de"]}
NotIn值不在列表中字符串列表{op: NotIn, value: ["bad-value"]}
InRegexp值匹配正则表达式正则列表{op: InRegexp, value: ["^node-.*"]}
Exists属性存在无需值{op: Exists}
DoesNotExist属性不存在无需值{op: DoesNotExist}
Gt大于单个数值{op: Gt, value: ["100"]}
Ge大于等于单个数值{op: Ge, value: ["100"]}
Lt小于单个数值{op: Lt, value: ["100"]}
Le小于等于单个数值{op: Le, value: ["100"]}
GtLt在范围内(开区间)两个数值{op: GtLt, value: ["10", "100"]}
GeLe在范围内(闭区间)两个数值{op: GeLe, value: ["10", "100"]}
IsTrue值为true无需值{op: IsTrue}
IsFalse值为false无需值{op: IsFalse}

操作符使用示例

# 字符串匹配
vendor: {op: In, value: ["8086", "10de"]}
vendor: {op: InRegexp, value: ["^(8086|10de)$"]}

# 存在性检查
AVX512F: {op: Exists}
deprecated_feature: {op: DoesNotExist}

# 数值比较
major: {op: Gt, value: ["5"]}
cores: {op: GtLt, value: ["8", "64"]}

# 布尔值
enabled: {op: IsTrue}
disabled: {op: IsFalse}

类型转换

  • 数值比较操作符(GtLt等)会尝试将字符串转换为整数
  • 如果值包含type: version,则按版本号比较
# 版本号比较
matchExpressions:
major: {op: Gt, value: ["5"], type: version}

可用的特性类型

NFD提供了丰富的特性用于匹配。以下是所有可用特性的详细列表:

CPU相关特性

cpu.cpuid

CPU指令集特性(Flag类型)

常见特性

特性名称说明
AVXAdvanced Vector Extensions
AVX2AVX 2.0
AVX512FAVX-512基础指令集
AVX512DQAVX-512双字和四字指令
AVX512CDAVX-512冲突检测指令
AVX512BWAVX-512字节和字指令
AVX512VLAVX-512向量长度扩展
AVX10_VERSIONAVX10版本号(Attribute类型)
AESNIAES加密指令集
SSEStreaming SIMD Extensions
SSE2SSE 2.0
SSE3SSE 3.0
SSE4SSE 4.0
SSE42SSE 4.2
SSSE3Supplemental SSE3

示例

matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
AVX512DQ: {op: Exists}

cpu.model

CPU型号信息(Attribute类型)

属性名称类型说明
familyintCPU家族
vendor_idstring供应商ID(如GenuineIntel
idintCPU型号ID
hypervisorstring虚拟化信息(仅s390x

示例

matchFeatures:
- feature: cpu.model
matchExpressions:
family: {op: Gt, value: ["6"]}
vendor_id: {op: In, value: ["GenuineIntel"]}

cpu.topology

CPU拓扑信息(Attribute类型)

属性名称类型说明
hardware_multithreadingbool是否启用超线程
socket_countintCPU插槽数量

示例

matchFeatures:
- feature: cpu.topology
matchExpressions:
hardware_multithreading: {op: IsTrue}
socket_count: {op: Gt, value: ["1"]}

cpu.cstate

Intel电源管理状态(Attribute类型)

属性名称类型说明
enabledboolC-state是否启用

cpu.pstate

Intel P-state驱动状态(Attribute类型)

属性名称类型说明
statusstringactivepassive
turbobool睿频是否启用
scalingstringpowersaveperformance

cpu.rdt

Intel RDT资源管理技术(Attribute类型)

属性名称类型说明
RDTMONflag监控技术
RDTCMTflag缓存监控
RDTMBMflag内存带宽监控
RDTL3CAflagL3缓存分配
RDTl2CAflagL2缓存分配
RDTMBAflag内存带宽分配
RDTL3CA_NUM_CLOSIDintL3 CA CLOSID数量

cpu.security

安全和可信执行环境(Attribute类型)

属性名称类型说明
sgx.enabledboolIntel SGX是否启用
sgx.epcintSGX加密页缓存大小(字节)
se.enabledboolIBM安全执行是否启用
tdx.enabledboolIntel TDX是否启用
tdx.total_keysintTDX密钥总数
tdx.protectedbool是否在TDX虚拟机中运行
sev.enabledboolAMD SEV是否启用
sev.es.enabledboolAMD SEV-ES是否启用
sev.snp.enabledboolAMD SEV-SNP是否启用
sev.asidsintSEV ASID数量
sev.encrypted_state_idsint加密状态ID数量

示例

matchFeatures:
- feature: cpu.security
matchExpressions:
sgx.enabled: {op: IsTrue}
sgx.epc: {op: Gt, value: ["1073741824"]} # > 1GB

cpu.sst

Intel SSTSpeed Select Technology)(Attribute类型)

属性名称类型说明
bf.enabledbool基准频率优化是否启用

cpu.coprocessor

CPU协处理器(Attribute类型)

属性名称类型说明
nx_gzipboolNest加速器GZIP支持

内核相关特性

kernel.config

内核配置选项(Attribute类型)

示例

matchFeatures:
- feature: kernel.config
matchExpressions:
X86: {op: In, value: ["y"]}
SECURITY_SELINUX: {op: In, value: ["y"]}
NUMA: {op: Exists}

kernel.loadedmodule

已加载的内核模块(Flag类型)

示例

matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
veth: {op: Exists}
br_netfilter: {op: Exists}

kernel.enabledmodule

已加载或内建的模块(Flag类型)

kernel.selinux

SELinux状态(Attribute类型)

属性名称类型说明
enabledboolSELinux是否处于强制模式

kernel.version

内核版本信息(Attribute类型)

属性名称类型说明
fullstring完整版本号
majorint主版本号
minorint次版本号
revisionint修订号

示例

matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Ge, value: ["5"]}
minor: {op: Ge, value: ["10"]}

内存相关特性

memory.numa

NUMA节点信息(Attribute类型)

属性名称类型说明
is_numabool是否为NUMA架构
node_countintNUMA节点数量

示例

matchFeatures:
- feature: memory.numa
matchExpressions:
is_numa: {op: IsTrue}
node_count: {op: Gt, value: ["1"]}

memory.nv

NVDIMM设备(Instance类型)

属性名称类型说明
devtypestring设备类型
modestring工作模式

memory.swap

交换分区(Attribute类型)

属性名称类型说明
enabledbool是否配置了交换分区

memory.hugepages

大页支持(Attribute类型)

属性名称类型说明
enabledbool是否配置了大页
hugepages-<size>string特定大小的大页数量

示例

matchFeatures:
- feature: memory.hugepages
matchExpressions:
enabled: {op: IsTrue}
hugepages-1Gi: {op: Gt, value: ["0"]}

网络相关特性

network.device

物理网络接口(Instance类型)

属性名称类型说明
namestring接口名称
operstatestring操作状态(up/down
speedstring速率(Mbps
sriov_numvfsstring已创建的VF数量
sriov_totalvfsstring最大VF数量
mtustringMTU大小

示例

matchFeatures:
- feature: network.device
matchExpressions:
sriov_totalvfs: {op: Gt, value: ["0"]}
speed: {op: Ge, value: ["10000"]} # >= 10Gbps

network.virtual

虚拟网络接口(Instance类型)

属性名称类型说明
namestring接口名称
operstatestring操作状态
speedstring速率
mtustringMTU大小

PCI设备特性

pci.device

PCI设备(Instance类型)

属性名称类型说明
classstring设备类别(如0300为显卡)
vendorstring供应商ID(如10deNVIDIA
devicestring设备ID
subsystem_vendorstring子系统供应商ID
subsystem_devicestring子系统设备ID
sriov_totalvfsstring最大VF数量
iommu_group/typestringIOMMU组类型
iommu/intel-iommu/versionstringIntel IOMMU版本

常见设备类别

类别代码说明
0200网络控制器
0300显示控制器(GPU
03023D控制器
0108NVMe控制器
1200处理加速器(FPGA等)

常见供应商ID

供应商ID厂商
10deNVIDIA
1002AMD
8086Intel
15b3Mellanox
10eeXilinx

示例

# 检测NVIDIA GPU
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
class: {op: In, value: ["0300", "0302"]}

# 检测NVMe设备
matchFeatures:
- feature: pci.device
matchExpressions:
class: {op: In, value: ["0108"]}

存储相关特性

storage.block

块存储设备(Instance类型)

属性名称类型说明
namestring设备名称
daxstring是否支持直接访问
rotationalstring是否为机械硬盘
nr_zonesstring分区数量
zonedstring是否为分区设备

示例

# 检测SSD
matchFeatures:
- feature: storage.block
matchExpressions:
rotational: {op: In, value: ["0"]}

USB设备特性

usb.device

USB设备(Instance类型)

属性名称类型说明
classstring设备类别
vendorstring供应商ID
devicestring设备ID
serialstring序列号

示例

matchFeatures:
- feature: usb.device
matchExpressions:
vendor: {op: In, value: ["2c7c"]} # Quectel
class: {op: In, value: ["02"]} # CDC设备

系统相关特性

system.osrelease

操作系统信息(Attribute类型)

来自/etc/os-release的所有字段都可用,常见的有:

属性名称类型说明
IDstring系统ID(如ubuntu
VERSION_IDstring版本号
NAMEstring系统名称
PRETTY_NAMEstring友好名称

示例

matchFeatures:
- feature: system.osrelease
matchExpressions:
ID: {op: In, value: ["ubuntu", "debian"]}
VERSION_ID: {op: InRegexp, value: ["^(20|22)\\."]}

system.dmiid

DMIDesktop Management Interface)信息(Attribute类型)

属性名称类型说明
sys_vendorstring系统供应商
product_namestring产品名称

示例

matchFeatures:
- feature: system.dmiid
matchExpressions:
sys_vendor: {op: In, value: ["Dell Inc.", "HP"]}

system.name

系统名称(Attribute类型)

属性名称类型说明
nodenamestringKubernetes节点名称

Local特性

local.label

本地标签(Attribute类型)

来自本地特性文件(/etc/kubernetes/node-feature-discovery/features.d/)的标签。

local.feature

本地特性(Attribute类型)

来自本地特性文件的特性(使用#+no-label标记的)。

Rule特性

rule.matched

前序规则匹配结果(Attribute类型)

用于引用前面规则定义的标签和变量:

- name: "first rule"
vars:
has_gpu: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}

- name: "second rule"
labels:
"gpu-node": "true"
matchFeatures:
- feature: rule.matched
matchExpressions:
has_gpu: {op: IsTrue}

实战示例

示例1:GPU型号精确检测

根据PCI设备ID精确识别GPU型号:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: nvidia-gpu-models
spec:
rules:
- name: "nvidia a100 40gb"
labels:
"nvidia.com/gpu.product": "A100-SXM4-40GB"
"nvidia.com/gpu.memory": "40gb"
"nvidia.com/gpu.architecture": "ampere"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["20b0"]}

- name: "nvidia a100 80gb"
labels:
"nvidia.com/gpu.product": "A100-SXM4-80GB"
"nvidia.com/gpu.memory": "80gb"
"nvidia.com/gpu.architecture": "ampere"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["20b2"]}

- name: "nvidia v100"
labels:
"nvidia.com/gpu.product": "V100-SXM2-16GB"
"nvidia.com/gpu.memory": "16gb"
"nvidia.com/gpu.architecture": "volta"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["1db4", "1db5", "1db6"]}

- name: "nvidia t4"
labels:
"nvidia.com/gpu.product": "Tesla-T4"
"nvidia.com/gpu.memory": "16gb"
"nvidia.com/gpu.architecture": "turing"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["1eb8"]}

示例2:CPU性能分级

根据CPU特性对节点进行性能分级:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: cpu-performance-classification
spec:
rules:
- name: "tier-1 flagship cpu"
labels:
"cpu.performance.tier": "1"
"cpu.performance.class": "flagship"
"workload.suitable": "hpc,ai-training,database"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
AVX512DQ: {op: Exists}
AVX512CD: {op: Exists}
AVX512BW: {op: Exists}
AVX512VL: {op: Exists}
- feature: cpu.topology
matchExpressions:
hardware_multithreading: {op: IsTrue}
- feature: cpu.model
matchExpressions:
family: {op: Ge, value: ["6"]}

- name: "tier-2 performance cpu"
labels:
"cpu.performance.tier": "2"
"cpu.performance.class": "performance"
"workload.suitable": "general-compute,web-server"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX2: {op: Exists}
AVX512F: {op: DoesNotExist}
- feature: cpu.topology
matchExpressions:
hardware_multithreading: {op: IsTrue}

- name: "tier-3 standard cpu"
labels:
"cpu.performance.tier": "3"
"cpu.performance.class": "standard"
"workload.suitable": "batch-job,cache"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX: {op: Exists}
AVX2: {op: DoesNotExist}

- name: "tier-4 basic cpu"
labels:
"cpu.performance.tier": "4"
"cpu.performance.class": "basic"
"workload.suitable": "logging,monitoring"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
SSE42: {op: Exists}
AVX: {op: DoesNotExist}

示例3:高速网络设备检测

检测并分类高速网络设备:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: high-speed-network-detection
spec:
rules:
- name: "mellanox 100gbe"
labels:
"network.type": "high-speed"
"network.speed": "100gbe"
"network.vendor": "mellanox"
"network.sriov": "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["15b3"]} # Mellanox
class: {op: In, value: ["0200"]}
- feature: network.device
matchExpressions:
sriov_totalvfs: {op: Gt, value: ["0"]}
speed: {op: Ge, value: ["100000"]}

- name: "intel 25gbe"
labels:
"network.type": "high-speed"
"network.speed": "25gbe"
"network.vendor": "intel"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]}
class: {op: In, value: ["0200"]}
- feature: network.device
matchExpressions:
speed: {op: GeLe, value: ["25000", "100000"]}

- name: "standard 10gbe"
labels:
"network.type": "standard"
"network.speed": "10gbe"
matchFeatures:
- feature: network.device
matchExpressions:
speed: {op: GeLe, value: ["10000", "25000"]}

- name: "basic 1gbe"
labels:
"network.type": "basic"
"network.speed": "1gbe"
matchFeatures:
- feature: network.device
matchExpressions:
speed: {op: Lt, value: ["10000"]}

示例4:存储类型分类

根据存储设备特性进行分类:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: storage-classification
spec:
rules:
- name: "nvme ssd"
labels:
"storage.type": "nvme-ssd"
"storage.performance": "ultra"
"storage.tier": "1"
annotations:
"storage.description": "NVMe SSD with high IOPS"
matchFeatures:
- feature: pci.device
matchExpressions:
class: {op: In, value: ["0108"]} # NVMe controller
- feature: storage.block
matchExpressions:
rotational: {op: In, value: ["0"]}

- name: "sata ssd"
labels:
"storage.type": "sata-ssd"
"storage.performance": "high"
"storage.tier": "2"
matchFeatures:
- feature: storage.block
matchExpressions:
rotational: {op: In, value: ["0"]}
- feature: pci.device
matchExpressions:
class: {op: NotIn, value: ["0108"]}

- name: "hdd"
labels:
"storage.type": "hdd"
"storage.performance": "standard"
"storage.tier": "3"
matchFeatures:
- feature: storage.block
matchExpressions:
rotational: {op: In, value: ["1"]}

示例5:内核版本和模块检测

检测内核版本和关键模块:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: kernel-features-detection
spec:
rules:
- name: "modern kernel"
labels:
"kernel.version.class": "modern"
"kernel.support.level": "full"
matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Ge, value: ["5"]}
minor: {op: Ge, value: ["10"]}

- name: "rdma support"
labels:
"kernel.feature.rdma": "true"
"network.rdma.enabled": "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
rdma_cm: {op: Exists}
ib_core: {op: Exists}
mlx5_core: {op: Exists}

- name: "ceph rbd support"
labels:
"storage.ceph.rbd": "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
rbd: {op: Exists}
libceph: {op: Exists}

- name: "ovs support"
labels:
"network.ovs.enabled": "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
openvswitch: {op: Exists}

示例6:安全合规检测

检测安全特性和合规性:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: security-compliance-check
spec:
rules:
- name: "pci-dss compliant"
labels:
"compliance.pci-dss": "true"
"security.level": "high"
taints:
- key: "security.high"
value: "true"
effect: NoSchedule
matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Ge, value: ["5"]}
- feature: kernel.config
matchExpressions:
AUDIT: {op: In, value: ["y"]}
SECURITY_SELINUX: {op: In, value: ["y"]}
- feature: kernel.selinux
matchExpressions:
enabled: {op: IsTrue}

- name: "intel sgx capable"
labels:
"security.intel-sgx": "true"
"tee.available": "true"
annotations:
"security.sgx.epc-size": "@cpu.security.sgx.epc"
matchFeatures:
- feature: cpu.security
matchExpressions:
sgx.enabled: {op: IsTrue}
sgx.epc: {op: Gt, value: ["0"]}

- name: "amd sev capable"
labels:
"security.amd-sev": "true"
"vm-encryption.available": "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
sev.enabled: {op: IsTrue}

示例7:使用模板动态生成标签

利用模板功能批量生成标签:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: dynamic-pci-labels
spec:
rules:
- name: "enumerate all pci devices"
labelsTemplate: |
{{ range .pci.device }}
pci.device.{{ .vendor }}-{{ .device }}.present=true
{{ end }}
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: Exists}

- name: "count gpu devices"
labelsTemplate: |
gpu.count={{ len .pci.device }}
{{ $first := index .pci.device 0 }}
gpu.primary.vendor={{ $first.vendor }}
gpu.primary.device={{ $first.device }}
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de", "1002"]} # NVIDIA or AMD
class: {op: In, value: ["0300", "0302"]}

- name: "network interfaces summary"
labelsTemplate: |
{{ $total := 0 }}
{{ $highspeed := 0 }}
{{ range .network.device }}
{{ $speed := .speed | int }}
{{ $total = add $total 1 }}
{{ if ge $speed 10000 }}
{{ $highspeed = add $highspeed 1 }}
{{ end }}
{{ end }}
network.interfaces.total={{ $total }}
network.interfaces.highspeed={{ $highspeed }}
matchFeatures:
- feature: network.device
matchExpressions:
name: {op: Exists}

示例8:复杂规则组合和变量传递

使用变量构建复杂的规则层次:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: advanced-node-classification
spec:
rules:
# 第一步:检测基础硬件
- name: "detect-gpu"
vars:
has_nvidia_gpu: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
class: {op: In, value: ["0300", "0302"]}

- name: "detect-high-end-cpu"
vars:
has_avx512: "true"
cpu_tier: "high"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
- feature: cpu.topology
matchExpressions:
socket_count: {op: Ge, value: ["2"]}

- name: "detect-fast-network"
vars:
has_fast_network: "true"
matchFeatures:
- feature: network.device
matchExpressions:
speed: {op: Ge, value: ["25000"]}
sriov_totalvfs: {op: Gt, value: ["0"]}

# 第二步:基于变量组合生成高层标签
- name: "ai-training-node"
labels:
"node.workload.type": "ai-training"
"node.tier": "premium"
taints:
- key: "workload.ai-training"
value: "true"
effect: NoSchedule
matchFeatures:
- feature: rule.matched
matchExpressions:
has_nvidia_gpu: {op: IsTrue}
has_avx512: {op: IsTrue}
has_fast_network: {op: IsTrue}

- name: "ai-inference-node"
labels:
"node.workload.type": "ai-inference"
"node.tier": "standard"
matchFeatures:
- feature: rule.matched
matchExpressions:
has_nvidia_gpu: {op: IsTrue}
has_avx512: {op: DoesNotExist}

- name: "hpc-compute-node"
labels:
"node.workload.type": "hpc-compute"
"node.tier": "premium"
matchFeatures:
- feature: rule.matched
matchExpressions:
has_avx512: {op: IsTrue}
has_fast_network: {op: IsTrue}
has_nvidia_gpu: {op: DoesNotExist}

- name: "general-purpose-node"
labels:
"node.workload.type": "general-purpose"
"node.tier": "standard"
matchFeatures:
- feature: rule.matched
matchExpressions:
has_nvidia_gpu: {op: DoesNotExist}
has_avx512: {op: DoesNotExist}

参考资料