I am currently a Researcher at Microsoft. I received my Ph.D. degree from University of Michigan in 2020, under the supervision of Dr. Jason J. Corso. I received my bachelor's degree from Nanjing University in 2015.
My research focuses on the intersection of computer vision and natural language processing (or vision+language), such as visual captioning, grounding, and question answering. My work intensively relies on deep learning and machine learning algorithms. My most recent efforts are on automatic video understanding, featured projects include vision-language pre-training (VLP), YouCook2, grounded video description, and densecap. Previously, I worked on Multi-Agent RL at Nanjing University. I have spent summer interns at Facebook AI Research, MSR, and Salesforce Research.