Kubernetes GPU virtualization and heterogeneous accelerator scheduling for AI infrastructure.
HAMi stands for Heterogeneous AI Computing Virtualization Middleware. Formerly known as k8s-vGPU-scheduler, HAMi helps platform teams share expensive GPUs and other AI accelerators across Kubernetes workloads, isolate device memory and compute, and schedule pods with device-aware policies without changing application code.
HAMi is a CNCF Sandbox and CNCF Landscape project. It is also listed in the CNAI Landscape.
AI infrastructure teams often run into the same Kubernetes accelerator problems: whole GPUs are allocated to small jobs, teams compete for scarce devices, different accelerator vendors expose different operational models, and schedulers lack enough device context to place workloads efficiently.
HAMi provides a Kubernetes-native layer for:
- Device sharing: allocate a fraction of a physical accelerator by memory, core, or device count.
- Resource isolation: enforce per-workload accelerator memory and compute limits where the device backend supports it.
- Device-aware scheduling: place pods with topology-aware, binpack, spread, and device-specific scheduling policies.
- Heterogeneous AI clusters: manage NVIDIA GPUs, NPUs, DCUs, MLUs, and other accelerator types through one scheduling and allocation workflow.
- Zero application changes: keep using standard Kubernetes resource requests and limits.
- Production operations: expose metrics, dashboards, WebUI, Helm installation, and community-supported deployment guidance.
- Increase GPU utilization in shared Kubernetes AI clusters.
- Run multi-tenant notebook, training, and inference workloads on the same accelerator pool.
- Build private cloud AI platforms with fair device allocation and quota control.
- Operate heterogeneous accelerator clusters across NVIDIA, Ascend, Cambricon, Hygon, Iluvatar, MetaX, Moore Threads, and other vendors.
- Combine HAMi with Kubernetes schedulers such as kube-scheduler and Volcano for batch AI workloads.
HAMi is composed of a mutating webhook, scheduler extender, device plugins, and device-specific in-container virtualization components.
Pod submission
-> HAMi mutating webhook
-> HAMi scheduler filter / score / bind
-> device allocation written to pod annotations
-> device plugin Allocate()
-> container runtime environment
-> HAMi monitor and metrics
HAMi lets workloads request only the accelerator resources they need. For example, the following pod asks for one physical NVIDIA GPU with 3 GiB of GPU memory:
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 3000The workload sees the allocated device resources inside the container, while HAMi coordinates scheduling, allocation, and isolation.
Notes:
- After installing HAMi, the value of
nvidia.com/gpuregistered on the node defaults to the number of vGPUs.- When requesting resources in a pod,
nvidia.com/gpurefers to the number of physical GPUs required by the current pod.
HAMi supports multiple heterogeneous accelerator backends, including GPUs, NPUs, DCUs, MLUs, GCUs, XPUs, and more. Device capabilities vary by vendor, model, driver, and hardware generation.
See the current HAMi supported devices page for the maintained support matrix.
For the NVIDIA device plugin path, prepare:
- NVIDIA driver >= 440
nvidia-dockerversion > 2.0- NVIDIA configured as the default runtime for containerd, Docker, or CRI-O
- Kubernetes >= 1.23
- glibc >= 2.17 and < 2.30
- Linux kernel >= 3.10
- Helm > 3.0
Label GPU nodes so HAMi can manage them:
kubectl label nodes <node-name> gpu=onAdd the HAMi Helm repository:
helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo updateInstall HAMi:
helm install hami hami-charts/hami -n kube-systemVerify that the scheduler and device plugin are running:
kubectl get pods -n kube-systemWhen hami-device-plugin and hami-scheduler are both Running, submit an example workload:
kubectl apply -f examples/nvidia/default_use.yamlFor the complete installation guide and configuration options, see the HAMi documentation.
HAMi supports multiple scheduling modes for AI workloads:
- binpack: pack workloads onto fewer nodes or devices to improve consolidation.
- spread: distribute workloads across nodes or devices to reduce contention.
- topology-aware scheduling: choose device combinations based on GPU topology when supported.
- dynamic MIG: create and allocate NVIDIA MIG instances dynamically for supported cards and modes.
HAMi works with the default Kubernetes scheduler path and can also be used with Volcano for batch-oriented AI workloads. See the HAMi website for current scheduler integration guides.
HAMi exposes metrics for monitoring cluster accelerator usage. After installation, metrics are available through the scheduler monitor endpoint:
http://<scheduler-ip>:<monitor-port>/metrics
The default monitor port is 31993. You can change it with Helm values such as --set scheduler.service.monitorPort=<port>.
HAMi also provides:
- HAMi-WebUI for visual cluster and device management.
- Grafana dashboard examples for accelerator monitoring.
- Benchmark material for evaluating workload behavior and scheduling effects.
HAMi is governed by maintainers and contributors. Governance is described in the HAMi community repository.
To contribute code, documentation, tests, or device backend improvements, read CONTRIBUTING.md.
The HAMi community is open to users, contributors, hardware vendors, and platform teams building Kubernetes-based AI infrastructure.
- Website: project-hami.io
- Discord: Join the HAMi Discord (recommended)
- Slack: #hami-dev on CNCF Slack
- Mailing list: hami-project
- Meeting notes and agenda
- Chinese community meeting: Friday 16:00 UTC+8, weekly — Meeting link
- English community meeting: Wednesday 16:00 UTC+8, biweekly — Meeting link
| Event | Talk |
|---|---|
| CHINA CLOUD COMPUTING INFRASTRUCTURE DEVELOPER CONFERENCE, Beijing 2024 | Unlocking heterogeneous AI infrastructure on k8s clusters |
| KubeDay Japan 2024 | Unlocking Heterogeneous AI Infrastructure K8s Cluster: Leveraging the Power of HAMi |
| KubeCon + AI_dev Open Source GenAI & ML Summit China 2024 | Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage |
| KubeCon + AI_dev Open Source GenAI & ML Summit China 2024 | Unlocking Heterogeneous AI Infrastructure K8s Cluster |
| KubeCon Europe 2024 | Cloud Native Batch Computing with Volcano: Updates and Future |
HAMi is licensed under the Apache License 2.0. See LICENSE for details.
Copyright Contributors to HAMi, established as HAMi a Series of LF Projects, LLC.




