Hello! I am a final year Ph.D. student at Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, advised by Prof. Patrico A. Vela.

My research interest lies in computer vision, language processing, and the application of these fields to develop robotic intelligence. Specifically, I am working on Robotic Grasping (including both 6-DoF and planar), Language Command Understanding, and addressing Open World challenges. More recently, I am also developing algorithms that leverage Large Language Models or Vision-Language Models to enable generalizable planning or spatial understanding capacities.

Here is my resume/CV (updated October 2024). I am currently open to full-time opportunities in industry. Feel free to reach out!

🔥 News

2024.05: 🎉Excited to join the Microsoft Mixed Reality team as a research scientist intern.
2023.07: 📝One paper accepted to ICCV 2023, on a feature-based image Out-of-Distribution detection method.
2023.06: 📝One paper accepted to IROS 2023,on an improved keypoint-based 6-DoF grasp synthesis strategy.
2023.05: 🎉Excited to join the Amazon Robotics stow perception team as an applied scientist intern.
2023.01: 📝One paper accepted to ICLR 2023, on action sequence planning with the transformer model.
2023.01: 📝One paper accepted to ICRA 2023, on Keypoint-based 6-DoF grasp detection.
2021.01: 📝Two papers accepted to ICRA 2021, on language-conditioned robotic grasping and semantic-based pixel feature learning for the camera relocalization.

📖 Educations

2021.01 - 2025 (Expected): Ph.D. in Electrical and Computer Engineering, Georgia Tech. Advised by Dr. Patricio A. Vela. Atlanta, GA, United States.
2019.08 - 2020.12: M.S. in Electrical and Computer Engineering, Georgia Tech. Atlanta, GA, United States.
2015.09 - 2019.06: B.E. in Aerospace Engineering, Beihang University. Beijing, China.

💻 Industrial Experience

2024.05 - 2024.08: Research Scientist Internship, Microsoft Research.

Mentor: Ben Lundell; Co-Mentor:Harpreet Sawhney
Topic: Reasoning on scene graphs with Large Language Models (LLMs).
Redmond, WA, United States.

2023.05 - 2023.08: Applied Scientist Internship, Amazon Robotics.

Manager: Sisir Karumanchi; Mentor:Shuai Han
Topic: Uncertainty estimation on deep vision models for quantifying the robotic action reliability.
Seattle, WA, United States.

📝 Publications

(* denotes equal contribution)

2024

A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) (In submission)

Yiye Chen, Harpreet Sawhney, Nicholas Gyde, Yanan Jian, Jack Saunders, Patricio A. Vela, Benjamin Lundell

Code(Coming Soon) | Project(Coming Soon)

A two-agent LLMs framework for reasoning and planning on scene graphs, leveraging their reasoning, code-writing, and communication skills.

ICCV 2023

WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis

Yiye Chen, Yunzhi Lin, Ruinian Xu, Patricio A. Vela

Code | Poster

A visual representation analysis approach to identify when a deep learning model doesn’t know in the open-world setting.
Showing effectiveness in various vision backbones, including ResNet, Vision Transformer, and CLIP vision encoder.

ICLR 2023

Planning with Language Models through Iterative Energy Minimization

Hongyi Chen*, Yilun Du*, Yiye Chen*, Patricio A. Vela, Joshua B. Tenenbaum

Project | Code

An energy-based learning and interative sampling method for action sequence planning with Transformer model.

IROS 2023

KGNv2: Separating Scale and Pose Prediction for Keypoint-based 6-DoF Grasp Synthesis on RGB-D input

Yiye Chen; Ruinian Xu; Yunzhi Lin; Hongyi Chen; Patricio A. Vela

Code | Presentation | Poster | Supplementary

Enhances Keypoint-GraspNet (see below) by addressing scale-related issues, where scale refers to the distance of a pose towards the single-view camera.

ICRA 2023

Keypoint-GraspNet: Keypoint-based 6-DoF Grasp Generation from the Monocular RGB-D input

Yiye Chen; Yunzhi Lin; Ruinian Xu; Patricio A. Vela

Code | Presentation | Poster | Supplementary

A keypoint-based approach for generating 6-DoF grasp poses from single-view RGB-D input.

ICRA 2021

A Joint Network for Grasp Detection Conditioned on Natural Language Commands

Yiye Chen; Ruinian Xu; Yunzhi Lin; Patricio A. Vela

Presentation | Supplementary

A language-conditioned robotic grasping method by fusing the visual and language embeddings.

GASP: Gaussian Avatars with Synthetic Priors, Jack Saunders, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker, Virginia Estellers, Nicholas Gydé, Vinay Namboodiri, Benjamin Lundell, CVPR 2025.
Simultaneous Multi-Level Descriptor Learning and Semantic Segmentation for Domain-Specific Relocalization, Xiaolong Wu*, Yiye Chen*, Cédric Pradalier, Patricio A. Vela, ICRA 2021.

📜 Academic Services

Conference Reviewer: IROS’23-24, ICRA’24, CVPR’24-25, ICLR’25
Journal Reviewer: The International Journal of Robotics Research (IJRR), IEEE Robotics and Automation Letters (RA-L), IEEE Transactions on Industrial Electronics (TIE)