Skip to content

tigrisdata/tag-benchmarks

Repository files navigation

TAG Benchmarks

A suite of ML training benchmarks for Tigris object storage and TAG (Tigris Acceleration Gateway). Each benchmark targets a different data format or training topology used in modern ML pipelines.

Benchmarks

# Name Format Dataset
2 ViT + ImageNet-21K Individual JPEGs (~5.5M) 500 GB ImageNet-21K
3 CLIP + WebDataset TAR shards (4–256 MB) 500 GB DataComp
4 DINOv2 + Lance Embedding Pipeline Lance columnar 500 GB synthetic embeddings
5 Multi-Node Distributed ViT WebDataset TAR shards 500 GB DataComp
6 s3torchbenchmarking (upstream) Dataset + checkpointing Synthetic
og ViT + S3IterableDataset Individual JPEGs 10 GB synthetic

Quick Start (Benchmark 2)

See RUNBOOK.md for a full step-by-step execution guide.

# On a g5.8xlarge with GPU, NVMe, and Tigris credentials configured:

cd benchmark-2/

# Upload dataset (~12–24 hours for 500 GB)
python upload_imagenet21k.py --subset-size 500 --bucket s3torch-benchmark-2

# Run entitlement benchmark (raw I/O ceiling — no GPU bottleneck)
python benchmark.py --config configs/entitlement.yaml

# Run training benchmark
python benchmark.py --config configs/training.yaml

Repo Structure

tag-benchmarks/
├── README.md              # This file
├── RUNBOOK.md             # Step-by-step Benchmark 2 execution guide
├── benchmark-2/           # ViT + ImageNet-21K
│   ├── benchmark.py
│   ├── upload_imagenet21k.py
│   ├── collect_results.py
│   ├── monitor.py
│   └── configs/
├── benchmark-3/           # CLIP + WebDataset
│   ├── benchmark.py
│   ├── prepare_webdataset.py
│   └── configs/
├── benchmark-4/           # DINOv2 + Lance (embedding pipeline)
│   ├── benchmark.py
│   ├── prepare_lance.py
│   └── configs/
├── benchmark-5/           # Multi-node distributed ViT
│   ├── benchmark.py
│   ├── setup_node.sh
│   ├── launch_distributed.sh
│   └── configs/
├── benchmark-6/           # s3torchbenchmarking (Hydra-based)
│   ├── src/
│   ├── conf/
│   └── utils/
└── benchmark-og/          # AWS S3 data loading benchmark replication
    ├── benchmark.py
    └── configs/

Key Design Decisions

Why g5.8xlarge? It has a useful ratio of NVMe (900 GB) to RAM (128 GB). TAG's NVMe cache holds the full dataset while page cache holds only ~25%, making TAG's contribution clearly measurable. Bigger instances (p4d, p5) have so much RAM that page cache masks TAG's benefit unless datasets are multi-TB.

Why entitlement tests? GPU-bound training benchmarks mask I/O differences — the A10G saturates at a fixed throughput regardless of storage backend. The entitlement test (no-op model) removes the GPU bottleneck and measures raw data pipeline throughput. This mirrors MLPerf Storage methodology.

Why 500 GB datasets? Minimum 4× system RAM (128 GB), aligned with MLPerf Storage sizing rules (5×). Smaller datasets risk page cache masking TAG's contribution.

About

ML training benchmarks for Tigris object storage and TAG (Tigris Acceleration Gateway).

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors