Skip to content

DSFans2014/HAMi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,322 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

English version | 中文版 | 日本語版

HAMi logo

LICENSE build status Releases OpenSSF Best Practices Go Report Card codecov FOSSA Status docker pulls slack discord website

HAMi

Kubernetes GPU virtualization and heterogeneous accelerator scheduling for AI infrastructure.

HAMi Architecture

HAMi stands for Heterogeneous AI Computing Virtualization Middleware. Formerly known as k8s-vGPU-scheduler, HAMi helps platform teams share expensive GPUs and other AI accelerators across Kubernetes workloads, isolate device memory and compute, and schedule pods with device-aware policies without changing application code.

HAMi is a CNCF Sandbox and CNCF Landscape project. It is also listed in the CNAI Landscape.

CNCF logo

Why HAMi?

AI infrastructure teams often run into the same Kubernetes accelerator problems: whole GPUs are allocated to small jobs, teams compete for scarce devices, different accelerator vendors expose different operational models, and schedulers lack enough device context to place workloads efficiently.

HAMi provides a Kubernetes-native layer for:

  • Device sharing: allocate a fraction of a physical accelerator by memory, core, or device count.
  • Resource isolation: enforce per-workload accelerator memory and compute limits where the device backend supports it.
  • Device-aware scheduling: place pods with topology-aware, binpack, spread, and device-specific scheduling policies.
  • Heterogeneous AI clusters: manage NVIDIA GPUs, NPUs, DCUs, MLUs, and other accelerator types through one scheduling and allocation workflow.
  • Zero application changes: keep using standard Kubernetes resource requests and limits.
  • Production operations: expose metrics, dashboards, WebUI, Helm installation, and community-supported deployment guidance.

Use Cases

  • Increase GPU utilization in shared Kubernetes AI clusters.
  • Run multi-tenant notebook, training, and inference workloads on the same accelerator pool.
  • Build private cloud AI platforms with fair device allocation and quota control.
  • Operate heterogeneous accelerator clusters across NVIDIA, Ascend, Cambricon, Hygon, Iluvatar, MetaX, Moore Threads, and other vendors.
  • Combine HAMi with Kubernetes schedulers such as kube-scheduler and Volcano for batch AI workloads.

How It Works

HAMi is composed of a mutating webhook, scheduler extender, device plugins, and device-specific in-container virtualization components.

Pod submission
  -> HAMi mutating webhook
  -> HAMi scheduler filter / score / bind
  -> device allocation written to pod annotations
  -> device plugin Allocate()
  -> container runtime environment
  -> HAMi monitor and metrics

Device Virtualization

HAMi lets workloads request only the accelerator resources they need. For example, the following pod asks for one physical NVIDIA GPU with 3 GiB of GPU memory:

resources:
  limits:
    nvidia.com/gpu: 1
    nvidia.com/gpumem: 3000

The workload sees the allocated device resources inside the container, while HAMi coordinates scheduling, allocation, and isolation.

HAMi Example

Notes:

  1. After installing HAMi, the value of nvidia.com/gpu registered on the node defaults to the number of vGPUs.
  2. When requesting resources in a pod, nvidia.com/gpu refers to the number of physical GPUs required by the current pod.

Supported Devices

HAMi supports multiple heterogeneous accelerator backends, including GPUs, NPUs, DCUs, MLUs, GCUs, XPUs, and more. Device capabilities vary by vendor, model, driver, and hardware generation.

See the current HAMi supported devices page for the maintained support matrix.

Quick Start

Prerequisites

For the NVIDIA device plugin path, prepare:

  • NVIDIA driver >= 440
  • nvidia-docker version > 2.0
  • NVIDIA configured as the default runtime for containerd, Docker, or CRI-O
  • Kubernetes >= 1.23
  • glibc >= 2.17 and < 2.30
  • Linux kernel >= 3.10
  • Helm > 3.0

Install With Helm

Label GPU nodes so HAMi can manage them:

kubectl label nodes <node-name> gpu=on

Add the HAMi Helm repository:

helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo update

Install HAMi:

helm install hami hami-charts/hami -n kube-system

Verify that the scheduler and device plugin are running:

kubectl get pods -n kube-system

When hami-device-plugin and hami-scheduler are both Running, submit an example workload:

kubectl apply -f examples/nvidia/default_use.yaml

For the complete installation guide and configuration options, see the HAMi documentation.

Scheduling Policies

HAMi supports multiple scheduling modes for AI workloads:

  • binpack: pack workloads onto fewer nodes or devices to improve consolidation.
  • spread: distribute workloads across nodes or devices to reduce contention.
  • topology-aware scheduling: choose device combinations based on GPU topology when supported.
  • dynamic MIG: create and allocate NVIDIA MIG instances dynamically for supported cards and modes.

HAMi works with the default Kubernetes scheduler path and can also be used with Volcano for batch-oriented AI workloads. See the HAMi website for current scheduler integration guides.

Observability And WebUI

HAMi exposes metrics for monitoring cluster accelerator usage. After installation, metrics are available through the scheduler monitor endpoint:

http://<scheduler-ip>:<monitor-port>/metrics

The default monitor port is 31993. You can change it with Helm values such as --set scheduler.service.monitorPort=<port>.

HAMi also provides:

  • HAMi-WebUI for visual cluster and device management.
  • Grafana dashboard examples for accelerator monitoring.
  • Benchmark material for evaluating workload behavior and scheduling effects.

HAMi WebUI

Roadmap, Governance, And Contributing

HAMi is governed by maintainers and contributors. Governance is described in the HAMi community repository.

To contribute code, documentation, tests, or device backend improvements, read CONTRIBUTING.md.

Community

The HAMi community is open to users, contributors, hardware vendors, and platform teams building Kubernetes-based AI infrastructure.

Talks And References

Event Talk
CHINA CLOUD COMPUTING INFRASTRUCTURE DEVELOPER CONFERENCE, Beijing 2024 Unlocking heterogeneous AI infrastructure on k8s clusters
KubeDay Japan 2024 Unlocking Heterogeneous AI Infrastructure K8s Cluster: Leveraging the Power of HAMi
KubeCon + AI_dev Open Source GenAI & ML Summit China 2024 Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage
KubeCon + AI_dev Open Source GenAI & ML Summit China 2024 Unlocking Heterogeneous AI Infrastructure K8s Cluster
KubeCon Europe 2024 Cloud Native Batch Computing with Volcano: Updates and Future

License

HAMi is licensed under the Apache License 2.0. See LICENSE for details.

Copyright Contributors to HAMi, established as HAMi a Series of LF Projects, LLC.

About

Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Go 96.5%
  • Shell 1.8%
  • Other 1.7%