TAG Benchmarks

A suite of ML training benchmarks for Tigris object storage and TAG (Tigris Acceleration Gateway). Each benchmark targets a different data format or training topology used in modern ML pipelines.

Benchmarks

#	Name	Format	Dataset
2	ViT + ImageNet-21K	Individual JPEGs (~5.5M)	500 GB ImageNet-21K
3	CLIP + WebDataset	TAR shards (4–256 MB)	500 GB DataComp
4	DINOv2 + Lance Embedding Pipeline	Lance columnar	500 GB synthetic embeddings
5	Multi-Node Distributed ViT	WebDataset TAR shards	500 GB DataComp
6	s3torchbenchmarking (upstream)	Dataset + checkpointing	Synthetic
og	ViT + S3IterableDataset	Individual JPEGs	10 GB synthetic

Quick Start (Benchmark 2)

See RUNBOOK.md for a full step-by-step execution guide.

# On a g5.8xlarge with GPU, NVMe, and Tigris credentials configured:

cd benchmark-2/

# Upload dataset (~12–24 hours for 500 GB)
python upload_imagenet21k.py --subset-size 500 --bucket s3torch-benchmark-2

# Run entitlement benchmark (raw I/O ceiling — no GPU bottleneck)
python benchmark.py --config configs/entitlement.yaml

# Run training benchmark
python benchmark.py --config configs/training.yaml

Repo Structure

tag-benchmarks/
├── README.md              # This file
├── RUNBOOK.md             # Step-by-step Benchmark 2 execution guide
├── benchmark-2/           # ViT + ImageNet-21K
│   ├── benchmark.py
│   ├── upload_imagenet21k.py
│   ├── collect_results.py
│   ├── monitor.py
│   └── configs/
├── benchmark-3/           # CLIP + WebDataset
│   ├── benchmark.py
│   ├── prepare_webdataset.py
│   └── configs/
├── benchmark-4/           # DINOv2 + Lance (embedding pipeline)
│   ├── benchmark.py
│   ├── prepare_lance.py
│   └── configs/
├── benchmark-5/           # Multi-node distributed ViT
│   ├── benchmark.py
│   ├── setup_node.sh
│   ├── launch_distributed.sh
│   └── configs/
├── benchmark-6/           # s3torchbenchmarking (Hydra-based)
│   ├── src/
│   ├── conf/
│   └── utils/
└── benchmark-og/          # AWS S3 data loading benchmark replication
    ├── benchmark.py
    └── configs/

Key Design Decisions

Why g5.8xlarge? It has a useful ratio of NVMe (900 GB) to RAM (128 GB). TAG's NVMe cache holds the full dataset while page cache holds only ~25%, making TAG's contribution clearly measurable. Bigger instances (p4d, p5) have so much RAM that page cache masks TAG's benefit unless datasets are multi-TB.

Why entitlement tests? GPU-bound training benchmarks mask I/O differences — the A10G saturates at a fixed throughput regardless of storage backend. The entitlement test (no-op model) removes the GPU bottleneck and measures raw data pipeline throughput. This mirrors MLPerf Storage methodology.

Why 500 GB datasets? Minimum 4× system RAM (128 GB), aligned with MLPerf Storage sizing rules (5×). Smaller datasets risk page cache masking TAG's contribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TAG Benchmarks

Benchmarks

Quick Start (Benchmark 2)

Repo Structure

Key Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmark-2		benchmark-2
benchmark-3		benchmark-3
benchmark-4		benchmark-4
benchmark-5		benchmark-5
benchmark-6		benchmark-6
benchmark-og		benchmark-og
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
RUNBOOK.md		RUNBOOK.md

Folders and files

Latest commit

History

Repository files navigation

TAG Benchmarks

Benchmarks

Quick Start (Benchmark 2)

Repo Structure

Key Design Decisions

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages