1

Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

We investigate representations from pre-trained text-to-image diffusion models for control tasks and showcase competitive performance across a wide range of tasks.

OpenEQA: Embodied Question Answering in the Era of Foundation Models

We present a new embodied question answering (EQA) dataset with open vocabulary questions.

What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

We conduct a study on using pre-trained visual representations (PVRs) to train robots for real-world tasks.

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

HomeRobot: Open-Vocabulary Mobile Manipulation

We propose a combined simulation and real-world benchmark on the problem of Open-Vocabulary Mobile Manipulation (OVMM).

Navigating to Objects Specified by Images

We present a modular system that can perform well on the Instance ImageNav task in both simulation and the real world.

Habitat-matterport 3d semantics dataset

We present Habitat-Matterport 3D Semantics (HM3DSEM), the largest dataset of 3D real-world spaces with densely annotated semantics.

Last-Mile Embodied Visual Navigation

A last-mile navigation module that connects to prior policies, leading to improved image-goal navigation results in simulation and real-robot experiments.

OVRL: Offline Visual Representation Learning for Embodied Navigation

In this work we propose OVRL, a two-stage representation learning strategy for visual navigation tasks in Embodied AI.

La-MAML: Look-Ahead Meta-Learning for Continual Learning

In this work we develop a gradient-based meta-learning algorithm for efficient, online continual learning, that is robust and scalable to real-world visual benchmarks.