OpenArm Dataset

Quick start

Install

pip install openarm_dataset

Sample usage

Basic:

>>> import openarm_dataset
>>> dataset = openarm_dataset.Dataset("tests/fixture/dataset_0.3.0")
>>> dataset.meta.episodes
[{'id': '0', 'success': False, 'task_index': 0}, {'id': '3', 'success': True, 'task_index': 0}]
>>> dataset.meta.tasks
[{'prompt': 'Run test.', 'description': 'Longer task description if need.'}]
>>> dataset.num_episodes
2

Obs/Action:

>>> obs = dataset.load_obs(0)
>>> list(obs.keys())
['arms/right/qpos', 'arms/right/qvel', 'arms/right/qtorque', 'arms/left/qpos', 'arms/left/qvel', 'arms/left/qtorque', 'lifter/elevation']
>>> obs["arms/right/qpos"].shape
(746, 8)
>>> obs["arms/right/qpos"].head(2)
                                 joint1    joint2    joint3    joint4    joint5    joint6    joint7   gripper
timestamp
2026-02-25 09:04:11.614229214 -0.039352  0.989118 -0.051771  0.735691  0.077740 -0.070724  0.079488 -0.124674
2026-02-25 09:04:11.618732974 -0.039352  0.989118 -0.051771  0.735691  0.077740 -0.070724  0.079488 -0.124674

>>> action = dataset.load_action(0, use_unixtime=True)
>>> list(action.keys())
['arms/right/qpos', 'arms/left/qpos', 'lifter/elevation']
>>> action["arms/right/qpos"].shape
(90, 8)

Camera:

>>> cameras = dataset.load_cameras(0)
>>> list(cameras.keys())
['wrist_left', 'wrist_right', 'ceiling', 'head']
>>> cam_head = cameras["head"]
>>> cam_head.num_frames
3
>>> cam_head.load_timestamps()
[1772010251.6187909, 1772010251.629775, 1772010251.6634612]
>>> frame = cam_head.get_frame(0)
>>> frame.timestamp
1772010251.6187909
>>> frame.path
PosixPath('.../head/1772010251618790832.jpeg')
>>> frame.load().shape
(600, 960, 3)
>>> for frame in cam_head.frames():
...     pass  # iterate over Frame objects

A camera's frames may be stored either as individual timestamped JPEG files in a directory (episodes/0/cameras/head/<timestamp>.jpeg) or packed into a single uncompressed tar archive (episodes/0/cameras/head.tar). Packing keeps the file count low enough for Hugging Face Hub's storage recommendations. Both layouts expose the same API shown above. For tar-backed cameras, frame.path is a synthetic .../head.tar/<timestamp>.jpeg path that locates the image inside the archive — it is not a real file, so use frame.load() or frame.read_bytes() to access the image data.

Sampling:

>>> samples = dataset.sample(hz=30, episode_index=0)
>>> samples
[Sample(timestamp=1772010251.6202147), Sample(timestamp=1772010251.653548)]
>>> samples[0].timestamp
1772010251.6202147
>>> samples[0].obs["arms/right/qpos"]
array([-0.0393523 ,  0.9891182 , -0.05177076,  0.7356907 ,  0.07774002,
       -0.07072392,  0.07948788, -0.1246737 ], dtype=float32)
>>> samples[0].action["arms/right/qpos"]
array([ 0.03098021,  0.991799  , -0.16657865,  0.96951085,  0.01440866,
        0.14349142, -0.18980259,  0.08221525], dtype=float32)
>>> {name: frame.load().shape for name, frame in samples[0].cameras.items()}
{'wrist_left': (600, 960, 3), 'wrist_right': (600, 960, 3), 'ceiling': (600, 960, 3), 'head': (600, 960, 3)}

Command-line tools

Validate a dataset:

openarm-dataset-validate <input>

Exits with status 1 if any errors are reported.

Repair a dataset:

openarm-dataset-repair <input> \
    [-o <output>]    # write the repaired dataset here; repairs in place if omitted

Fills isolated single-frame gaps (a null or NaN in a qpos/qvel/ qtorque/value array) by averaging the immediately preceding and following frame values, per array element. Gaps spanning two or more consecutive frames, and gaps at the first or last frame, cannot be averaged and are left untouched with a warning on stderr. The command always exits with status 0; run openarm-dataset-validate afterwards to confirm the result.

Merge multiple datasets:

openarm-dataset-merge <input1> <input2> [<input3> ...] \
    -o <output>    \
    [--symlink]    # create symlinks instead of copying episode data

All input datasets must have the same version, equipment, and frequencies. Tasks are deduplicated by prompt: identical prompts are treated as the same task. Episodes are renumbered sequentially starting from 0.

Convert a dataset:

openarm-dataset-convert <input> <output> \
    [--format {openarm,lerobot_v2.1,gr00t}] \
    [--camera-format {dir,tar}] # default dir (openarm only); tar packs each \
                                # camera into one .tar archive \
    [--fps INT]                # default 30 (lerobot/gr00t only) \
    [--smoothing-cutoff FLOAT] # default 1.0 (lerobot/gr00t only) \
    [--train-split FLOAT]      # default 0.8 (lerobot/gr00t only) \
    [--success-only]           # lerobot/gr00t only

The --fps, --smoothing-cutoff, --train-split, and --success-only flags apply only when --format lerobot_v2.1 or --format gr00t. The gr00t format produces a LeRobot v2.1 dataset plus a GR00T-compatible meta/modality.json (see Isaac-GR00T data preparation).

Upload a dataset to the Hugging Face Hub:

openarm-dataset-upload <input> \
    --repo-id <user>/<dataset> \
    [--private]                # create the repo as private if it does not exist

The whole dataset directory is uploaded to a dataset repository, creating it if it does not already exist, and tagged with the dataset version. Cameras stored as directories of JPEG files are repacked in place into one .tar archive per camera before uploading, to stay within Hugging Face Hub's file-count recommendations. Repacking is lossless and reversible (openarm-dataset-convert --camera-format dir restores the JPEG-directory layout).

Development

Test

uv sync
uv run pytest

License

Licensed under the Apache License 2.0. See LICENSE.txt for details.

Code of Conduct

All participation in the OpenArm project is governed by our Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
dev		dev
src/openarm_dataset		src/openarm_dataset
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenArm Dataset

Quick start

Install

Sample usage

Command-line tools

Development

Test

Related links

License

Code of Conduct

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenArm Dataset

Quick start

Install

Sample usage

Command-line tools

Development

Test

Related links

License

Code of Conduct

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages