使用volcano-vgpu时,不需要 安装HAMi,仅使用Volcano vgpu device-plugin即可。它可以为由volcano管理的NVIDIA设备提供设备共享机制。
该插件源码基于Nvidia Device Plugin开发,并使用HAMi-core实现对GPU卡的硬隔离支持。
Volcano vgpu仅在volcano > 1.9版本中可用。
1. 准备工作
1.1 镜像准备
Volcano调度器已集成支持HAMI vGPU,我们需要以前准备以下镜像到本地集群中:
docker.io/projecthami/volcano-vgpu-device-plugin:v1.11.0
下载到本地harbor中,新地址为:
aiharbor.msxf.local/test/projecthami/volcano-vgpu-device-plugin:v1.11.0
1.2 节点准备
1.2.1 vGPU节点标签污点
需要给需要安装vGPU组件的节点打上特定的标签和污点,以方便管理启用vGPU的节点,并且避免原有的nvidia device plugin对该vGPU节点的资源管理造成影响和资源分配冲突。
标签如下:
volcano.sh/vgpu.enabled: "true"
污点如下:
volcano.sh/vgpu=hami:NoSchedule
1.2.2 卸载vGPU节点的nvidia-device-plugin
在已经给vGPU节点标记标签和污点后,需要通过kubectl delete指令删除对应vGPU节点的nvidia device plugin pod,例如:
kubectl delete pod nvidia-device-plugin-daemonset-vx6fk
2. 执行部署
2.1 部署volcano-vgpu-device-plugin
2.1.1 部署文件
注意部署文件中Daemonset中的nodeSelector及tolerations:volcano-vgpu-device-plugin.yaml
2.1.2 配置说明
Volcano vGPU的默认配置如下:
nvidia:
resourceCountName: volcano.sh/vgpu-number
resourceMemoryName: volcano.sh/vgpu-memory
resourceMemoryPercentageName: volcano.sh/vgpu-memory-percentage
resourceCoreName: volcano.sh/vgpu-cores
overwriteEnv: false
defaultMemory: 0
defaultCores: 0
defaultGPUNum: 1
deviceSplitCount: 10
deviceMemoryScaling: 1
deviceCoreScaling: 1
gpuMemoryFactor: 1
knownMigGeometries: []
关键配置项说明:
| 配置项 | 说明 | 示例 |
|---|---|---|
resourceCountName | vGPU个数的资源名称 | volcano.sh/vgpu-number |
resourceMemoryName | vGPU显存大小的资源名称 | volcano.sh/vgpu-memory |
resourceCoreName | vGPU算力的资源名称 | volcano.sh/vgpu-cores |
resourceMemoryPercentageName | vGPU显存比例的资源名称,仅用在Pod的资源申请中 | volcano.sh/vgpu-memory-percentage |
deviceSplitCount | GPU分割数,每张GPU最多可同时运行的任务数 | 10 |
2.1.3 执行结果
$ kubectl apply -f volcano-vgpu-device-plugin.yaml
configmap/volcano-vgpu-device-config created
configmap/volcano-vgpu-node-config created
serviceaccount/volcano-device-plugin created
clusterrole.rbac.authorization.k8s.io/volcano-device-plugin created
clusterrolebinding.rbac.authorization.k8s.io/volcano-device-plugin created
Warning: spec.template.metadata.annotations[scheduler.alpha.kubernetes.io/critical-pod]: non-functional in v1.16+; use the "priorityClassName" field instead
daemonset.apps/volcano-device-plugin created
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
volcano-admission-7dc9b78fc6-686tb 1/1 Running 0 20d
volcano-admission-7dc9b78fc6-d9vzk 1/1 Running 0 20d
volcano-admission-7dc9b78fc6-h2ssl 1/1 Running 0 20d
volcano-controllers-855c676dd4-4gpxp 1/1 Running 1 (13d ago) 20d
volcano-controllers-855c676dd4-pspzg 1/1 Running 0 20d
volcano-controllers-855c676dd4-zl8cd 1/1 Running 0 20d
volcano-device-plugin-7g6v2 2/2 Running 0 22s
volcano-scheduler-6645c59d6d-56xdc 1/1 Running 0 6m58s
volcano-scheduler-6645c59d6d-p549s 1/1 Running 0 6m58s
volcano-scheduler-6645c59d6d-pqt68 1/1 Running 0 6m58s
查看节点vGPU资源,可以看到原有的nvidia device plugin注入的资源nvidia.com/gpu已经清空,新生成了vGPU相关的资源volcano.sh/vgpu-cores、volcano.sh/vgpu-memory及volcano.sh/vgpu-number。
# ...
Capacity:
cpu: 128
ephemeral-storage: 562291Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263746296Ki
nvidia.com/gpu: 0
nvidia.com/gpu.shared: 0
pods: 110
volcano.sh/vgpu-cores: 800
volcano.sh/vgpu-memory: 196512
volcano.sh/vgpu-number: 80
Allocatable:
cpu: 127600m
ephemeral-storage: 562291Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 256048108548
nvidia.com/gpu: 0
nvidia.com/gpu.shared: 0
pods: 110
volcano.sh/vgpu-cores: 800
volcano.sh/vgpu-memory: 196512
volcano.sh/vgpu-number: 80
# ...
自动生成的资源项说明:
| 资源项 | 说明 | 示例 |
|---|---|---|
volcano.sh/vgpu-cores | vGPU算力的资源量百分比,是节点上总卡数*100 | 800 |
volcano.sh/vgpu-memory | vGPU显存的总资源量,单位Mi,是节点上总卡数*单卡显存数。由于4090显卡的单卡显存为24564Mi,那么这里的总显存量为196512Mi | 196512 |
volcano.sh/vgpu-number | vGPU个数,是节点上总卡数*deviceSplitCount配置 | 80 |
2.2 启用volcano调度器支持vGPU
修改volcano-scheduler-configmap,增加以下插件支持:
- name: deviceshare
arguments:
# 是否启用vgpu特性
deviceshare.VGPUEnable: true
# volcano-vgpu-device-config这个ConfigMap对应的命名空间
# 便于调度器自动读取ConfigMap内容
deviceshare.KnownGeometriesCMNamespace: volcano-system
修改后内容如下(仅供示例参考,具体根据自身需要调整volcano action和plugin配置):
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: drf
enablePreemptable: false
- name: deviceshare
arguments:
# 是否启用vgpu特性
deviceshare.VGPUEnable: true
# volcano-vgpu-device-config这个ConfigMap对应的命名空间
# 便于调度器自动读取ConfigMap内容
deviceshare.KnownGeometriesCMNamespace: volcano-system
- name: predicates
- name: capacity-card
arguments:
cardUnlimitedCpuMemory: true
- name: nodeorder
- name: binpack
修改后重启volcano-scheduler。
3. 运行测试
3.1 vGPU基本使用
该测试Pod使用的镜像为nvidia/cuda:12.2.0-base,下载到本地集群harbor仓库的镜像地址aiharbor.msxf.local/test/nvidia/cuda:12.2.0-base-ubuntu22.04:
apiVersion: v1
kind: Pod
metadata:
name: test-vgpu
spec:
# 需要使用volcano调度器
schedulerName: volcano
# 容忍所有污点,仅做测试
tolerations:
- key: volcano.sh/vgpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu.product
operator: Exists
effect: NoSchedule
- key: special.accelerate.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/node.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/ib.present
operator: Exists
effect: NoSchedule
containers:
- name: cuda-container
image: aiharbor.msxf.local/test/nvidia/cuda:12.2.0-base-ubuntu22.04
command: ["sleep"]
args: ["100000"]
resources:
requests:
volcano.sh/vgpu-number: 2 # (必须)请求 2 张 GPU 卡
volcano.sh/vgpu-memory: 3000 # (可选)每个 vGPU 使用 3G 显存,超过单卡显存则用最大单卡显存
volcano.sh/vgpu-cores: 50 # (可选)每个 vGPU 使用 50% 核心
limits:
volcano.sh/vgpu-number: 2
volcano.sh/vgpu-memory: 3000
volcano.sh/vgpu-cores: 50
运行后,查看Pod信息:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
test-vgpu 1/1 Running 0 23s
进入Pod容器执行nvidia-smi命令查看vGPU资源信息,执行以下指令:
kubectl exec -it test-vgpu bash
查看vGPU资源信息如下:
root@test-vgpu:/# nvidia-smi
[HAMI-core Msg(18:140441960732480:libvgpu.c:839)]: Initializing.....
Mon Nov 24 12:13:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:BA:00.0 Off | Off |
| 30% 35C P8 13W / 450W | 0MiB / 3000MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:BB:00.0 Off | Off |
| 30% 33C P8 24W / 450W | 0MiB / 3000MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
[HAMI-core Msg(18:140441960732480:multiprocess_memory_limit.c:455)]: Calling exit handler 18
root@test-vgpu:/#
在标准输出中以HAMI-core开头的信息属于HAMI-core通过CUDA API劫持的调试信息,表示HAMI-core实际以及起作用,例如[HAMI-core Msg(18:140441960732480:multiprocess_memory_limit.c:455)]: Calling exit handler 18表示是由HAMi-core组件执行完成,它会在nvidia-smi命令末尾执行一些资源清理工作。
3.2 使用nvidia-device-plugin的资源
原本使用nvidia device plugin的节点资源不会受影响,部署的Pod YAML如下:
apiVersion: v1
kind: Pod
metadata:
name: test-nvidia-device-plugin
spec:
# 需要使用volcano调度器
schedulerName: volcano
# 容忍所有污点,仅做测试
tolerations:
- key: volcano.sh/vgpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu.product
operator: Exists
effect: NoSchedule
- key: special.accelerate.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/node.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/ib.present
operator: Exists
effect: NoSchedule
containers:
- name: cuda-container
image: aiharbor.msxf.local/test/nvidia/cuda:12.2.0-base-ubuntu22.04
command: ["sleep"]
args: ["100000"]
resources:
requests:
nvidia.com/gpu: 2 # 使用nvidia device plugin注册的资源
limits:
nvidia.com/gpu: 2
3.3 vGPU资源与NVIDIA资源名称兼容
按照节点维度启用vGPU后,该vGPU节点只能使用vGPU的资源名称进行Pod资源申请,无法再使用原有的资源名称调度到vGPU节点上。
Volcano vGPU也支持将整卡和vGPU进行兼容性的资源名称配置,例如将vGPU的资源名称和nvidia的资源名称保持一致(nvidia.com/gpu)。我们来做一下兼容性测试。
3.3.1 配置文件变化
调整vGPU全局资源名称的配置如下(resourceCountName配置从volcano.sh/vgpu-number改为nvidia.com/gpu):
nvidia:
resourceCountName: nvidia.com/gpu
resourceMemoryName: volcano.sh/vgpu-memory
resourceMemoryPercentageName: volcano.sh/vgpu-memory-percentage
resourceCoreName: volcano.sh/vgpu-cores
overwriteEnv: false
defaultMemory: 0
defaultCores: 0
defaultGPUNum: 1
deviceSplitCount: 10
deviceMemoryScaling: 1
deviceCoreScaling: 1
gpuMemoryFactor: 1
knownMigGeometries: []
随后重启volcano-vgpu-device-plugin,发现volcano-vgpu-device-plugin组件的资源名称并未在节点上发现没有生效,经过查看volcano-vgpu-device-plugin和volcano的deviceshare插件的源码,发现:
volcano-vgpu-device-config的ConfigMap配置文件只是给volcano的deviceshare插件使用的。volcano-vgpu-device-plugin组件的源码中忽略了ConfigMap的该配置,而是通过命令行参数指定资源名称,其支持的命令行参数如下:命令行参数 说明 默认值 resource-namevGPU个数的资源名称,生成到节点上volcano.sh/vgpu-numberresource-memory-namevGPU显存大小的资源名称,生成到节点上volcano.sh/vgpu-memoryresource-core-namevGPU算力的资源名称,生成到节点上volcano.sh/vgpu-coresdebug是否开启调试模式 false- 这两个组件的相关配置项需要保持一致,否则无法部署
Pod。
将命令行参数:
containers:
- image: aiharbor.msxf.local/test/projecthami/volcano-vgpu-device-plugin:v1.11.0
args: ["--device-split-count=10"]
调整为:
containers:
- image: aiharbor.msxf.local/test/projecthami/volcano-vgpu-device-plugin:v1.11.0
args: [
"--device-split-count=10",
"--resource-name=nvidia.com/gpu"
]
3.3.2 部署文件示例
这是完整的volcano-vgpu-device-plugin组件部署文件,仅供参考:volcano-vgpu-device-config.compatible.yaml
执行后,volcano-vgpu-device-plugin组件会重启,同时手动重启volcano scheduler,随后查看vGPU节点资源情况如下,可以看到,vGPU的卡资源名称和NVIDIA保持一致,使用的是nvidia.com/gpu:
# ...
Capacity:
cpu: 128
ephemeral-storage: 562291Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263746296Ki
nvidia.com/gpu: 80
nvidia.com/gpu.shared: 0
pods: 110
volcano.sh/vgpu-cores: 800
volcano.sh/vgpu-memory: 196512
volcano.sh/vgpu-number: 80
Allocatable:
cpu: 127600m
ephemeral-storage: 562291Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 256048108548
nvidia.com/gpu: 80
nvidia.com/gpu.shared: 0
pods: 110
volcano.sh/vgpu-cores: 800
volcano.sh/vgpu-memory: 196512
volcano.sh/vgpu-number: 0
# ...
3.3.3 测试文件示例
运行以下示例将Pod调度到vGPU节点上:
apiVersion: v1
kind: Pod
metadata:
name: test-vgpu-compatible
spec:
# 需要使用volcano调度器
schedulerName: volcano
# 新增节点选择,运行到vGPU节点上
nodeSelector:
name: dev-app-2-150-master-1
# 容忍所有污点,仅做测试
tolerations:
- key: volcano.sh/vgpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu.product
operator: Exists
effect: NoSchedule
- key: special.accelerate.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/node.usage
operator: Exists
effect: NoSchedule
- key: maip.msxf.io/ib.present
operator: Exists
effect: NoSchedule
containers:
- name: cuda-container
image: aiharbor.msxf.local/test/nvidia/cuda:12.2.0-base-ubuntu22.04
command: ["sleep"]
args: ["100000"]
resources:
requests:
nvidia.com/gpu: 2 # 请求 2 张 GPU 卡
limits:
nvidia.com/gpu: 2
执行后,可以看到Pod已经被成功调度和运行。进入Pod容器查看资源情况,可以看到申请的算力和显存是按照整卡来分配的,这也是HAMi vGPU默认的行为,以便于和原有的NVIDIA device plugin兼容:
$ kubectl exec -it test-vgpu-compatible bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@test-vgpu-compatible:/# nvidia-smi
[HAMI-core Msg(15:139748339885888:libvgpu.c:839)]: Initializing.....
Tue Nov 25 09:26:16 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:BA:00.0 Off | Off |
| 30% 34C P8 13W / 450W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:BB:00.0 Off | Off |
| 30% 32C P8 25W / 450W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
[HAMI-core Msg(15:139748339885888:multiprocess_memory_limit.c:455)]: Calling exit handler 15
root@test-vgpu-compatible:/#
4. 监控指标
Volcano vgpu的指标通过volcano scheduler暴露,可以通过进入集群中任一支持curl命令的Pod,随后curl一下volcano scheduler的接口地址,例如:
# 10.233.75.65为主volcano scheduler的ClusterIP
curl 10.233.75.65:8080/metrics
返回的指标比较重,其中与vGPU相关的指标:volcano-vgpu-metrics.txt
5. 常见问题
5.1 vGPU Pod部署时报错UnexpectedAdmissionError
在调整完volcano-vgpu-device-config这个ConfigMap中的resourceCountName配置项为自定义的资源名称后,Pod部署时状态为UnexpectedAdmissionError:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
test-vgpu-compatible 0/1 UnexpectedAdmissionError 0 75s
通过kubectl describe pod查看Pod的Events如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25s volcano Successfully assigned volcano-system/test-vgpu-compatible to dev-app-2-150-master-1
Warning UnexpectedAdmissionError 25s kubelet Allocate failed due to rpc error: code = Unknown desc = device request not found, which is unexpected
通过翻查volcano和volcano-vgpu-device-plugin源码,经过排查是配置文件不一致引起的。在修改资源名称时,我们需要保证3个地方的配置正确性和一致性,拿resourceCountName配置项修改为nvidia.com/gpu举例,需要调整以下地方:
volcano-vgpu-device-config的resourceCountName: nvidia.com/gpuvolcano-vgpu-device-plugin命令行参数--resource-name=nvidia.com/gpuvolcano-scheduler-configmap的deviceshare插件需要指定正确的命名空间,如下:可以通过查看调度的日志来排查调度器使用的配置文件是否正确,命令如下:- name: deviceshare
arguments:
# 是否启用vgpu特性
deviceshare.VGPUEnable: true
# volcano-vgpu-device-config这个ConfigMap对应的命名空间
# 便于调度器自动读取ConfigMap内容
deviceshare.KnownGeometriesCMNamespace: volcano-system$ kubectl logs volcano-scheduler-6645c59d6d-bcw68 | grep "device config"
I1125 09:11:57.408175 1 config.go:113] "Initializing volcano device config" device-configs={"NvidiaConfig":{"ResourceCountName":"nvidia.com/gpu","ResourceMemoryName":"volcano.sh/vgpu-memory","ResourceCoreName":"volcano.sh/vgpu-cores","ResourceMemoryPercentageName":"volcano.sh/vgpu-memory-percentage","ResourcePriority":"","OverwriteEnv":false,"DefaultMemory":0,"DefaultCores":0,"DefaultGPUNum":1,"DeviceSplitCount":10,"DeviceMemoryScaling":1,"DeviceCoreScaling":1,"DisableCoreLimit":false,"MigGeometriesList":[],"GPUMemoryFactor":1}}