What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

January 2024

We evaluate 5 different PVR on 5 sim and 5 real benchmarks

Abstract

We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study spans five different PVRs, two different policy-learning paradigms (imitation and reinforcement learning), and three different robots for 5 distinct manipulation and indoor navigation tasks. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals.

Type

Conference paper

Publication

In the International Conference on Robotics and Automation 2024 and ICRA Workshop on Mobile Manipulation and Embodied Intelligence, 2024 (Spotlight)

Click the Cite button above to view the bibtex.

Robot Learning Representation Learning

What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

Abstract

Karmesh Yadav

Ph.D. Student at Georgia Tech

Related