Foundation Model Learning#

In the foundation model learning benchmark we are interested in studying foundation model training, i.e., training with our pretraining datasets, followed by fine-tuning on our target datasets. There are two phases of training:

Pretraining: pretraining on 2000+ hours of data across 300 tasks. The data comprises human datasets for 300 tasks (482 hours) and synthetic MimicGen data across 60 atomic tasks (1615 hours)
Target task fine-tuning: fine-tuning the pretrained model on human datasets across 50 tasks (193 hours). We fine-tune the model independently on three separate splits of target data:
- Atomic-Seen (18 atomic tasks, also seen in pretraining)
- Composite-Seen (16 composite tasks, also seen in pretraining)
- Composite-Unseen (a separate set of 16 composite tasks, not seen in pretraining)

Benchmark results and checkpoints#

We perform a benchmark featruing the GR00T N1.5 algorithm. We compare pretraining only, target training only, and pretraining following by target task fine-tuning. Here is a summary of our benchmarking results (average task success rate, in %). We share the model checkpoints for reference.

Task Type	Pretraining Only	Target Task Learning Only			Pretraining + Target Task Post-Training
Task Type	Pretraining Only	10%	30%	100%	10%	30%	100%
`Atomic-Seen`	41.9%	38.7%	50.6%	60.6%	56.9%	59.1%	68.5%
`Composite-Seen`	0.0%	11.0%	22.7%	35.0%	25.4%	34.6%	40.6%
`Composite-Unseen`	0.2%	11.2%	27.5%	33.3%	22.7%	30.8%	42.1%
Average	15.1%	21.0%	34.3%	43.7%	35.9%	42.2%	51.1%

Model Checkpoints#

Model Checkpoint	Link
Pretraining Only	Link
Target Only (100%) - `Atomic-Seen`	Link
Target Only (100%) - `Composite-Seen`	Link
Target Only (100%) - `Composite-Unseen`	Link
Pretraining + Target Post-Training (100%) - `Atomic-Seen`	Link
Pretraining + Target Post-Training (100%) - `Composite-Seen`	Link
Pretraining + Target Post-Training (100%) - `Composite-Unseen`	Link

Benchmark instructions#

GR00T#

Guidelines#

We use a batch size of 128
For the pretraining, we train for 80k steps
For target task fine-tuning, we train for 60k steps
We always evaluate the models in target scenes

Train model#

# run pretraining
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/pretraining \
--dataset_soup pretrain_human300_mg60 \
--max_steps 80000

# target task fine-tuning: for atomic-seen, composite-seen, composite-unseen tasks
# the following three training experiments can be run in parallel
python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/atomic_seen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_atomic_seen \
--max_steps 60000

python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/composite_seen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_composite_seen \
--max_steps 60000

python scripts/gr00t_finetune.py \
--output-dir expdata/foundation_model_learning/target_task_finetuning/composite_unseen \
--base_model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--dataset_soup target_composite_unseen \
--max_steps 60000

Evaluate model#

# Evaluate pretraining model
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/pretraining/checkpoint-80000 \
--task_set atomic_seen composite_seen composite_unseen \
--split target

# evaluate target fine-tuning: atomic-seen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/atomic_seen/checkpoint-60000 \
--task_set atomic_seen \
--split target

# evaluate target fine-tuning: composite-seen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/composite_seen/checkpoint-60000 \
--task_set composite_seen \
--split target

# evaluate target fine-tuning: composite-unseen tasks
python scripts/run_eval.py \
--model_path expdata/foundation_model_learning/target_posttraining/composite_unseen/checkpoint-60000 \
--task_set composite_unseen \
--split target

Report evaluation results#

python gr00t/eval/get_eval_stats.py \
--dir <your-ckpt>

Foundation Model Learning

Contents