Runtimes

tinygrad supports various runtimes, enabling your code to scale across a wide range of devices. The default runtime can be automatically selected based on the available hardware, or you can force a specific runtime to be default using environment variables (e.g., DEV=CPU).

Runtime	Description	Compiler Options	Requirements
NV	Provides acceleration for NVIDIA GPUs	nvrtc (default) PTX (`DEV=NV:PTX`)	Ampere/Ada/Blackwell series GPUs. You can select an interface via the `DEV` variable. See NV interfaces for details.
AMD	Provides acceleration for AMD GPUs	LLVM (`DEV=AMD:LLVM`) HIP/COMGR (`DEV=AMD:HIP`)	RDNA2 or newer GPUs. You can select an interface via the `DEV` variable. See AMD interfaces for details.
QCOM	Provides acceleration for QCOM GPUs	-	6xx series GPUs
METAL	Utilizes Metal for acceleration on Apple devices	-	M1+ Macs; Metal 3.0+ for `bfloat` support
CUDA	Utilizes CUDA for acceleration on NVIDIA GPUs	nvrtc (default) PTX (`DEV=CUDA:PTX`)	NVIDIA GPU with CUDA support
CL	Accelerates computations using OpenCL on GPUs	-	OpenCL 2.0 compatible device
CPU	Runs on CPU using the clang or llvm compiler	Clang JIT (default) LLVM IR (`DEV=CPU:LLVM`)	`clang` compiler in system `PATH`
WEBGPU	Runs on GPU using the Dawn WebGPU engine (used in Google Chrome)	-	Dawn library installed and discoverable. Binaries: pydawn v0.3.0

Interoperability

tinygrad provides interoperability with OpenCL and PyTorch, allowing efficient tensor data sharing between frameworks through the Tensor.from_blob API. This enables zero-copy operations by working directly with external memory pointers.

Important: When using external memory pointers with tinygrad tensors, you must ensure these pointers remain valid throughout the entire lifetime of the tinygrad tensor to prevent memory corruption.

`CUDA`/`METAL` PyTorch Interoperability

You can seamlessly work with CUDA/MPS tensors between PyTorch and tinygrad without data copying:

from tinygrad.dtype import _from_torch_dtype
tensor1 = torch.tensor([1.0, 2.0, 3.0], device=torch.device("cuda"))
tiny_tensor1 = Tensor.from_blob(tensor1.data_ptr(), tensor1.shape, dtype=_from_torch_dtype(tensor1.dtype), device='CUDA')

# Before tinygrad calculations, mps needs to be synchronized to make sure data is valid.
if data.device.type == "mps": torch.mps.synchronize()
else: torch.cuda.synchronize()

x = (tiny_tensor1 + 1).realize()

`QCOM` OpenCL Interoperability

tinygrad supports OpenCL interoperability on QCOM backend.

Buffer interop allows direct access to OpenCL memory buffers:

# create raw opencl buffer.
cl_buf = cl.clCreateBuffer(cl_context, cl.CL_MEM_READ_WRITE, 0x100, None, status := ctypes.c_int32())

# extract pointers
cl_buf_desc_ptr = to_mv(ctypes.addressof(cl_buf), 8).cast('Q')[0]
rawbuf_ptr = to_mv(cl_buf_desc_ptr, 0x100).cast('Q')[20] # offset 0xA0 is a raw gpu pointer.

# create tiny tensor
tiny = Tensor.from_blob(rawbuf_ptr, (8, 8), dtype=dtypes.int, device='QCOM')

And the same for the images:

# create cl image.
cl_img = cl.clCreateImage2D(cl_context, cl.CL_MEM_READ_WRITE, cl.cl_image_format(cl.CL_RGBA, cl.CL_FLOAT), w, h, 0, None, status := ctypes.c_int32())

# extract pointers
cl_buf_desc_ptr = to_mv(ctypes.addressof(cl_img), 8).cast('Q')[0]
rawbuf_ptr = to_mv(cl_buf_desc_ptr, 0x100).cast('Q')[20] # offset 0xA0 is a raw gpu pointer.

# create tiny tensor
tiny = Tensor.from_blob(rawbuf_ptr, (h*w*4,), dtype=dtypes.imagef((h,w)), device='QCOM')

AMD Interfaces

AMD backend supports several interfaces for communicating with devices:

KFD: uses the amdgpu driver
PCI: uses the AM driver
USB: USB3 interface for asm24xx chips.

You can force an interface by setting the interface component of the DEV environment variable to one of these values. When set to PCI, this may unbind your GPU from the amdgpu driver.

NV Interfaces

NV backend supports several interfaces for communicating with devices:

NVK: uses the nvidia driver
PCI: uses the NV driver

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtimes

Interoperability

`CUDA`/`METAL` PyTorch Interoperability

`QCOM` OpenCL Interoperability

AMD Interfaces

NV Interfaces

FilesExpand file tree

runtime.md

Latest commit

History

runtime.md

File metadata and controls

Runtimes

Interoperability

CUDA/METAL PyTorch Interoperability

QCOM OpenCL Interoperability

AMD Interfaces

NV Interfaces

`CUDA`/`METAL` PyTorch Interoperability

`QCOM` OpenCL Interoperability