Multi-Task Learning#

In the multi-task learning benchmark, we study training on multi-task pretraining datasets. We do policy learning on the Human Pretraining Datasets, which data across 300 tasks, comprising 65 atomic tasks and 235 composite tasks. For each task, we provide 100 task demonstrations per task, resulting in 482 hours of total data.


Benchmark instructions#

Diffusion Policy#

Guidelines#

  • We use a batch size of 192 and train for 250k steps

  • We evaluate the model in pretrain scenes

Train model#

python train.py \
--config-name=train_diffusion_transformer_bs192 \
task=robocasa/pretrain_human300

Evaluate model#

python eval_robocasa.py \
--checkpoint <checkpoint-path> \
--task_set atomic_seen composite_seen composite_unseen \
--split pretrain

Report evaluation results#

python diffusion_policy/scripts/get_eval_stats.py \
--dir <outputs-dir>

Openpi#

Guidelines#

  • We use a batch size of 64 and train for 75k steps

  • We evaluate the model in pretrain scenes

Train model#

XLA_PYTHON_CLIENT_MEM_FRACTION=1.0 python scripts/train.py \
pi0_robocasa_pretrain_human300 \
--exp-name=multitask_learning

Evaluate model#

# part a: start inference server
python scripts/serve_policy.py \
--port=8000 policy:checkpoint \
--policy.config=pi0_robocasa_pretrain_human300 \
--policy.dir=expdata/pi0_robocasa_pretrain_human300/multitask_learning/75000

# part b: run evals on server
python examples/robocasa/main.py \
--args.port 8000 \
--args.task_set atomic_seen composite_seen composite_unseen \
--args.split pretrain \
--args.log_dir expdata/pi0_robocasa_pretrain_human300/multitask_learning/75000

Report evaluation results#

python examples/robocasa/get_eval_stats.py \
--dir expdata/pi0_robocasa_pretrain_human300/multitask_learning/75000

GR00T#

Guidelines#

  • We use a batch size of 128 and train for 120k steps

  • We evaluate the model in pretrain scenes

Train model#

python scripts/gr00t_finetune.py \
--output-dir expdata/multitask_learning \
--dataset_soup pretrain_human300 \
--max_steps 120000

Evaluate model#

python scripts/run_eval.py \
--model_path expdata/multitask_learning/checkpoint-120000 \
--task_set atomic_seen composite_seen composite_unseen \
--split pretrain

Report evaluation results#

python gr00t/eval/get_eval_stats.py \
--dir expdata/multitask_learning/checkpoint-120000