Karmesh Yadav

Ph.D. Student at Georgia Tech

Georgia Tech

Biography

Hi! I am Karmesh, a PhD student at Georgia Tech, advised by Prof. Dhruv Batra and Prof. Zsolt Kira.

During my PhD I am interested in creating better pretraining strategies for Embodied AI agents. Previously, I was an AI Resident at FAIR, working with the Habitat and Cortex team under the supervision of Dr. Oleksandr Maksymets and Prof. Batra. Before that, I worked as a Senior Robotics Engineer at ISEE, an autonomous vehicles startup working on automating yard trucks. I completed my Masters in Robotics Systems Development (MRSD) at the CMU Robotics Institute in 2020.

Download my resumé.

Interests

Embodied AI
Robot Learning
Reinforcement Learning

Education

Ph.D. in Computer Science, 2026
Georgia Institute of Technology
Masters in Robotic Systems Development, 2020
Carnegie Mellon University
B.Tech in Mechanical Engineering, 2017
Indian Institute of Technology, Guwahati

Experience

Intern, Technical Staff, AI

Yutori

Aug 2024 – Nov 2024 San Francisco

Developed deployment pipelines for web agents to operate on real websites while effectively avoiding bot detection.
Created data filtering pipeline and trained VLM-based agents for web navigation tasks.

AI Resident

Facebook AI Research

Aug 2021 – Jun 2023 Menlo Park

Worked on self-supervised pretraining techniques for learning useful representations for embodied agents.
Released the HM3D-Semantics dataset and the Open-Vocabulary Mobile Manipulation benchmark based of Habitat Simulator.
Organised multiple challenges on Embodied Navigation and Rearrangement in CVPR and NeurIPS.

Robotics Engineer

isee

Jul 2020 – Aug 2021 Pittsburgh

Explored deep uncertainty estimation techniques for predicting the closed loop tracking performance of an autonomous vehicle controller. Estimated the collision probability of the AV with respect to obstacles in an occupancy grid.
Improved the trajectory optimization planner and robustified its collision checking. This led to an increased confidence in its performance and resulted in its deployment on the AV.
Developed the speed planning module for safely achieving three-fold increase in the operating speed of the AV.

Software Development Intern

isee

May 2019 – Aug 2019 Boston

Built toolboxes to automate the system identification and calibration procedure of Isee’s vehicles.
Researched and implemented various vehicle and tire models for control application in AVs.

Publications & Preprint

Quickly discover relevant content by filtering publications.

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Memo enables efficient memory formation and usage for long-horizon embodied RL tasks, improving generalization and efficiency.

Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, Rahaf Aljund

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

We investigate representations from pre-trained text-to-image diffusion models for control tasks and showcase competitive performance across a wide range of tasks.

Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner

Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

OpenEQA: Embodied Question Answering in the Era of Foundation Models

We present a new embodied question answering (EQA) dataset with open vocabulary questions.

Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul Mcvay, Oleksandr Maksymets, Sergio Arnaud, Karmesh Yadav, Qiyang Li, Ben Newman, Mohit Sharma, Vincent Berges, Shiqi Zhang, Pulkit Agrawal, Yonatan Bisk, Dhruv Batra, Mrinal Kalakrishnan, Franziska Meier, Chris Paxton, Sasha Sax, Aravind Rajeswaran

OpenEQA: Embodied Question Answering in the Era of Foundation Models

What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

We conduct a study on using pre-trained visual representations (PVRs) to train robots for real-world tasks.

Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Tingfan Wu, Jay Vakil, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

HomeRobot: Open-Vocabulary Mobile Manipulation

We propose a combined simulation and real-world benchmark on the problem of Open-Vocabulary Mobile Manipulation (OVMM).

Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin S Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander Clegg, John M Turner, Zsolt Kira, Manolis Savva, Angel X Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

HomeRobot: Open-Vocabulary Mobile Manipulation

Navigating to Objects Specified by Images

We present a modular system that can perform well on the Instance ImageNav task in both simulation and the real world.

Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Navigating to Objects Specified by Images

Habitat-matterport 3d semantics dataset

We present Habitat-Matterport 3D Semantics (HM3DSEM), the largest dataset of 3D real-world spaces with densely annotated semantics.

Karmesh Yadav, Ram Ramrakhya, Santhosh K. Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel X. Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot