Foundation Model Learning#
In the foundation model learning benchmark we are interested in studying foundation model training, i.e., training with our pretraining datasets, followed by fine-tuning on our target datasets. There are two phases of training:
Pretraining: pretraining on 2000+ hours of data across 300 tasks. The data comprises human datasets for 300 tasks (482 hours) and synthetic MimicGen data across 60 atomic tasks (1615 hours)
Target task fine-tuning: fine-tuning the pretrained model on human datasets across 50 tasks (193 hours). We fine-tune the model independently on three separate splits of target data:
Atomic-Seen (18 atomic tasks, also seen in pretraining)
Composite-Seen (16 composite tasks, also seen in pretraining)
Composite-Unseen (a separate set of 16 composite tasks, not seen in pretraining)
Benchmark instructions#
GR00T#
Guidelines#
We use a batch size of 128
For the pretraining, we train for 80k steps
For target task fine-tuning, we train for 60k steps
We always evaluate the models in target scenes
Train model#
# run pretraining
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/pretraining \
--dataset_soup pretrain_human300_mg60 \
--max_steps 80000
# target task fine-tuning: for atomic-seen, composite-seen, composite-unseen tasks
# the following three training experiments can be run in parallel
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/atomic_seen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_atomic_seen \
--max_steps 60000
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/composite_seen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_composite_seen \
--max_steps 60000
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/composite_unseen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_composite_unseen \
--max_steps 60000
Evaluate model#
# Evaluate pretraining model
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--task_set atomic_seen composite_seen composite_unseen \
--split target
# evaluate target fine-tuning: atomic-seen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/atomic_seen/checkpoint-60000 \
--task_set atomic_seen \
--split target
# evaluate target fine-tuning: composite-seen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/composite_seen/checkpoint-60000 \
--task_set composite_seen \
--split target
# evaluate target fine-tuning: composite-unseen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/composite_unseen/checkpoint-60000 \
--task_set composite_unseen \
--split target
Report evaluation results#
python gr00t/eval/get_eval_stats.py \
--dir <your-ckpt>