Luowei Zhou

Robotics Institute

University of Michigan

I am currently a Researcher at Microsoft. I received my Ph.D. degree from University of Michigan in 2020, under the supervision of Dr. Jason J. Corso. I received my bachelor's degree from Nanjing University in 2015.

My research focuses on the intersection of computer vision and natural language processing (or vision+language), such as visual captioning, grounding, and question answering. My work intensively relies on deep learning and machine learning algorithms. My most recent efforts are on automatic video understanding, featured projects include vision-language pre-training (VLP), YouCook2, grounded video description, and densecap. Previously, I worked on Multi-Agent RL at Nanjing University. I have spent summer interns at Facebook AI Research, MSR, and Salesforce Research.


  • [05/2020] Excited to join Microsoft Dynamics 365 AI Research! Check out our team members and publications.
  • [05/2020] Slides and videos from our CVPR'20 tutorial on Recent Advances in V+L are available now!
  • [04/2020] My thesis defense titled "Language-Driven Video Understanding" is available on YouTube now. Thanks to Dan for the editing/captions.
  • [04/2020] CVPR'20 Activity-Entities Object Localization (Grounding) Challenge (a part of the annual ActivityNet Challenge) has officially started! Click here for more details!
  • [04/2020] YouCook2 text-to-video retrieval task is hosted at the CVPR'20 Video Pentathlon Workshop. Also, check out this awesome demo built by Antoine!
  • [11/2019] Our VLP work is accepted by AAAI'20 (spotlight)! VLP is featured in MSR blog, VentureBeat, and TDS.
  • [09/2019] We introduce our work on Unified Vision-Language Pre-training (VLP), which achieves SotA on image captioning and VQA (datasets: COCO/VQA2.0/Flickr30k) with a single model architecture. Code available on Github. Try it out!
  • [12/2019] Upcoming services: program committee member/reviewer for CVPR, ECCV, NeurIPS, AAAI, ACL, EMNLP, ICML, IJCAI, and ACM MM etc.
  • [09/2019] I am working with Prof. Justin Johnson on a new class on Deep Learning for Computer Vision at UMich.
  • [04/2019] We released the source code for our CVPR'19 oral paper Grounded Video Description! The evaluation server for our dataset ANet-Entities is live on Codalab!
  • [04/2019] Our grounded video description paper is accepted by CVPR'19 (oral). We made the ActivityNet-Entities dataset (158k bboxes on 52k captions) available at Github including evaluation scripts. Source code is on the way!
  • [07/2019] Our paper on Dynamic Graph Modules for Activity Recognition is accepted to BMVC'19.
  • [04/2019] We released the source code for our BMVC'18 work on weakly supervised video object grounding! The new YouCook2-BB dataset is available for download. The evaluation server is on Codalab!
  • [03/2019] Serve as a program committee member/reviewer for CVPR, ICCV, TPAMI, IJCV, AAAI, ACM MM etc.
  • [02/2019] We released the source code on dense video captioning (CVPR 2018).
  • [09/2018] Our weakly-supervised video grounding paper is accepted by BMVC 2018.
  • [03/2018] Our dense video captioning paper is accepted by CVPR 2018 (spotlight).
  • [02/2018] I will join Facebook AI Research (FAIR) for Research Intern in summer 2018.
  • [02/2018] I will co-organize CVPR'18 Workshop on Fine-grained Instructional Video Understanding (FIVER).
  • [11/2017] Our paper on YouCook2 and procedure segmentation is accepted by AAAI 2018 (oral).