
This research introduces Posterior Behavioral Cloning (POSTBC), a novel pretraining method designed to enhance reinforcement learning (RL) finetuning for robotic policies. Standard behavioral cloning often fails because it overfits to specific demonstration data, leading to an action coverage deficit that prevents the model from exploring effectively during later stages. To solve this, the authors propose training a policy to model the posterior distribution of the demonstrator’s behavior, which naturally increases entropy and action diversity in states where data is scarce. This approach ensures the agent remains competent in familiar scenarios while remaining open to diverse observations necessary for efficient online improvement. Experiments across various robotic benchmarks and real-world manipulation tasks demonstrate that POSTBC significantly accelerates finetuning efficiency without sacrificing initial performance. Ultimately, the work proves that creating a more uncertainty-aware initialization is a critical, yet previously overlooked, factor in achieving human-level robotic control.