RoboCasa is a large-scale simulation framework for training generally capable robots to perform everyday tasks. It features realistic and diverse human-centered environments with a focus on kitchen scenes. We create these environments with the aid of generative AI tools, such as large language models (LLMs) and text-to-image/3D generative models. We provide over 2,500 3D assets across 150+ object categories and dozens of interactable furniture and appliances. As part of the first release, we include a suite of 100 tasks, representing a wide spectrum of everyday activities. Together with the simulated tasks, we offer a dataset of high-quality human demonstrations and leverage automated trajectory generation techniques to significantly expand the amount of training data with little additional cost.

Realistic and Diverse Scenes

In this initial release, we focus on kitchen scenes. To capture the complexity and diversity of real-world environments, we consult numerous architecture and home design magazines and compile a collection of kitchen layouts and styles reflecting the vast diversity of kitchens in homes around the world. We model these kitchens according to standard size and spatial specifications and fit them with a large repository of interactable furniture and appliances spanning cabinets, stoves, sinks, microwaves, and more.

Cross-Embodiment Support

The simulator supports mobile manipulators of diverse form factors, such as single-arm mobile platforms, humanoid robots, and quadruped robots with arms.

Interactable Furniture and Appliances

Each kitchen scene is equipped with a selection of interactable furniture and appliances. Several types of interactable objects are articulated; for example, a robot can open and close doors on microwaves and twist knobs on stoves. Other types of interactable objects can undergo state changes; for example, when a knob on the stove is turned, the corresponding burner turns on.

Augmenting Scene Diversity with Text-to-Image Models

Each scene can be customized by replacing textures from a large selection of high-quality AI-generated textures created using the popular text-to-image models from MidJourney. We provide 100 textures for walls, 100 for the floor, 100 for counters, and 100 for cabinet panels, respectively. These textures can be used as a form of realistic domain randomization to increase the visual diversity of our training datasets substantially.

Creating Diverse Object Assets with Text-to-3D Models

We curate a repository of over 2,500 objects across more than 150 categories, spanning a variety of fruits, vegetables, packaged foods, and receptacles. Some of the object assets are from the Objaverse dataset, and the other majority of the objects are generated by text-to-3D object generation models provided by Luma AI.

Training Foundational Robot Skills

We focus on eight foundational skills as the basic building blocks to scaffold long-horizon manipulation behaviors for the majority of household activities: (1) Pick and place, (2) Opening and closing doors, (3) Opening and closing drawers, (4) Twisting knobs, (5) Turning levers, (6) Pressing buttons, (7) Insertion, and (8) Navigation. The current release includes 25 atomic tasks for systematically training and evaluating these skills.

Pick and Place

Opening and Closing Doors

Turning Levers

Twisting Knobs

Pressing Buttons

Generating Composite Tasks with LLM Guidance

Composite tasks involve sequencing skills to solve semantically meaningful activities, from restocking kitchen supplies to brewing coffee. Our goal in creating these tasks is to capture realistic and diverse tasks that reflect the ecological statistics of real-world household activities in the human-centered world. We use the guidance of large language models (LLMs), GPT-4 particularly, to define our tasks, as they encapsulate a vast amount of common sense and world knowledge of the human world and can thus effectively provide task candidates based on the environments and the robot’s skills.

Steaming Vegetables

Restocking Kitchen Supplies

Brewing Coffee

Team

Citation

@inproceedings{robocasa2024,
  title={RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots},
  author={Soroush Nasiriany and Abhiram Maddukuri and Lance Zhang and Adeet Parikh and Aaron Lo and Abhishek Joshi and Ajay Mandlekar and Yuke Zhu},
  booktitle={Robotics: Science and Systems (RSS)},
  year={2024}
}