Overview of Datasets#

RoboCasa offers over 2,200 hours of demonstration data, comprising human teleoperation data and synthetic data. Broadly, the data is split into pretraining datasets and target datasets. The pretraining datasets feature 300 diverse tasks across 2,500 pretraining kitchens, while the target datasets feature 50 target tasks across a distinct set of 10 heldout target kitchens.

Dataset statistics across pretraining and target settings.
Setting	Num Tasks	Num Scenes	Demos per Task	Dataset Size (hrs)
Pretraining (Human)	300	2500	100	482
Pretraining (MimicGen)	60	2500	10,000	1615
Target (Human)	50	10	500	193

We provide a detailed overview of the pretraining and target datasets below.

Pretraining Datasets#

RoboCasa offers ~2,000 hours of pretraining demonstration data. The pretraining datasets feature 300 diverse tasks across 2500 pretraining kitchens. We feature both human and synthetic datasets:

Human Datasets#

482 hours of data collected via teleoperation. The data spans 300 tasks (65 atomic tasks and 235 composite tasks), with 100 demonstrations per task. Go to the Atomic Tasks and Composite Tasks pages to see the list of supported tasks.

Synthetic Datasets#

1615 hours of data generated via MimicGen. The data spans 60 atomic tasks, with ~10k demonstrations per task. Go to the Atomic Tasks page to see the list of supported tasks.

Target Datasets#

In addition to pretraining data, RoboCasa offers over 193 hours of high-quality demonstration data for target tasks collected via teleoperation. The target datasets feature 50 diverse tasks across 10 distinct target kitchen scenes. Note that these target scenes are distinct from the pretraining scenes represented in the pretraining datasets. For each task, we provide 500 human demonstrations collected via teleoperation.

We split these datasets into three groups:

Atomic-Seen (18 tasks): 18 atomic tasks, with all tasks also represented in pretraining datasets.
Composite-Seen (16 tasks): 16 composite tasks, with all tasks also represented in pretraining datasets.
Composite-Unseen (16 tasks): 16 composite tasks, only seen in target datasets and not in pretraining datasets.

Target composite task datasets include per-frame subtask annotations: every timestep is labeled with a subtask index, atomic-skill name, stage (i.e. pick / place / navigate), and a natural-language instruction, to support hierarchical policy learning.

Atomic-Seen Tasks#

Task	Description	Horizon	Video

Composite-Seen Tasks#

Task	Description

Composite-Unseen Tasks#

Task	Description

Overview of Datasets

Contents