Karmesh Yadav

Karmesh Yadav

Ph.D. Student at Georgia Tech

Georgia Tech

Biography

Hi! I am Karmesh, a PhD student at Georgia Tech, advised by Prof. Dhruv Batra and Prof. Zsolt Kira. I am currently a research intern at Mistral AI, working on multimodal reasoning.

My research focuses on building agents that can reason, remember, and act in complex environments, with an emphasis on multimodal reasoning and embodied AI. Previously, I was an AI Resident at FAIR, working with the Habitat and Cortex team under the supervision of Dr. Oleksandr Maksymets and Prof. Batra. Before that, I worked as a Senior Robotics Engineer at ISEE, an autonomous vehicles startup working on automating yard trucks. I completed my Masters in Robotics Systems Development (MRSD) at the CMU Robotics Institute in 2020.

I am looking for full-time roles starting this fall. Please reach out if you think there may be a good fit.

Download my resumé.

Interests
  • Multimodal Reasoning
  • Agents
  • Embodied AI
Education
  • Ph.D. in Computer Science, 2026

    Georgia Institute of Technology

  • Masters in Robotic Systems Development, 2020

    Carnegie Mellon University

  • B.Tech in Mechanical Engineering, 2017

    Indian Institute of Technology, Guwahati

Experience

 
 
 
 
 
AI Scientist Intern
May 2025 – Present Palo Alto
  • Initiated and led the Multimodal Reasoning effort, establishing end-to-end training, evaluation, and data pipelines.
  • Developing rubric-guided reinforcement learning methods to reduce visual hallucinations and improve grounded reasoning in multimodal language models.
 
 
 
 
 
Intern, Technical Staff, AI
Aug 2024 – Nov 2024 San Francisco
  • Developed deployment pipelines for web agents to operate on real websites while effectively avoiding bot detection.
  • Created data filtering pipeline and trained VLM-based agents for web navigation tasks.
 
 
 
 
 
AI Resident
Aug 2021 – Jun 2023 Menlo Park
  • Researched self-supervised pretraining techniques for learning useful visual representations for embodied agents.
  • Released the HM3D-Semantics (HM3DSem) dataset and the Open-Vocabulary Mobile Manipulation (OVMM) benchmark based on the Habitat Simulator.
 
 
 
 
 
Senior Robotics Engineer
Jul 2020 – Aug 2021 Boston
  • Explored deep uncertainty estimation techniques for predicting the closed loop tracking performance of an autonomous vehicle controller. Estimated the collision probability of the AV with respect to obstacles in an occupancy grid.
  • Improved the trajectory optimization planner and robustified its collision checking. This led to an increased confidence in its performance and resulted in its deployment on the AV.
 
 
 
 
 
Software Development Intern
May 2019 – Aug 2019 Boston
  • Built toolboxes to automate the system identification and calibration procedure of ISEE’s vehicles.
  • Researched and implemented various vehicle and tire models for control applications in autonomous vehicles.
 
 
 
 
 
Intern, Autonomous Driving Team
Aug 2017 – Nov 2017 Hyderabad
  • Worked on improving the localization module of an autonomous vehicle by fusing ORB-SLAM output with GPS, IMU, and wheel odometry.

Publications & Preprint

Quickly discover relevant content by filtering publications.
OpenEQA: Embodied Question Answering in the Era of Foundation Models
We present a new embodied question answering (EQA) dataset with open vocabulary questions.
OpenEQA: Embodied Question Answering in the Era of Foundation Models
What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?
We conduct a study on using pre-trained visual representations (PVRs) to train robots for real-world tasks.
What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
HomeRobot: Open-Vocabulary Mobile Manipulation
We propose a combined simulation and real-world benchmark on the problem of Open-Vocabulary Mobile Manipulation (OVMM).
HomeRobot: Open-Vocabulary Mobile Manipulation
Navigating to Objects Specified by Images
We present a modular system that can perform well on the Instance ImageNav task in both simulation and the real world.
Navigating to Objects Specified by Images
Habitat-matterport 3d semantics dataset
We present Habitat-Matterport 3D Semantics (HM3DSEM), the largest dataset of 3D real-world spaces with densely annotated semantics.
Habitat-matterport 3d semantics dataset