What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

We conduct a study on using pre-trained visual representations (PVRs) to train robots for real-world tasks.

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav and ObjectNav without task-specific modules.