Luowei Zhou
I am a research scientist at Google Deepmind.
Prior to Google Deepmind, I spent two years at Microsoft working on vision foundation models. Featured projects include FLorence and ClipBERT.
I received my Ph.D. degree from University of Michigan in 2020, under the supervision of Dr. Jason J. Corso. I worked on projects including vision-language pre-training (Unified VLP), YouCook2, and one of the first Visual Transformers (densecap).
I received my bachelor's degree from Nanjing University in 2015, where I worked on Multi-Agent RL. I spent summer interns at FAIR, MSR, and Salesforce Research. I am one of the winners of CVPR 2021 Best Student Paper HM.
Email  / 
CV  / 
Google Scholar  / 
Twitter  / 
Linkedin  / 
Github
|
|
News
[12/2023] Gemini is out!
[12/2023] Try out Bard on image captioning, search & VQA!
[07/2023] MaMMuT is accepted at TMLR.
[02/2023] MIST is accepted at CVPR'23.
[09/2022] 3 papers accepted at NeurIPS'22.
[07/2022] 2 papers accepted at ECCV'22.
[06/2022] Florence & Visual Clues are covered by The Economist! Check out XD's keynote at CVPR'22.
[03/2022] 3 papers accepted at CVPR'22.
[06/2021] Our ClipBERT won the CVPR'21 Best Student Paper HM!
[06/2021] My talk on recent advances in video-and-language pre-training from our CVPR'21 vision+language tutorial.
[06/2021] Congrats to Team RUC and INRIA on winning our CVPR'21 ActivityNet-Entities challenge. video and report.
[06/2021] 1 paper accepted at NeurIPS'21 and 1 paper at ACL'21.
[06/2021] We will host Video-And-Language Understanding Evaluation challenge (VALUE) at ICCV'21.
[03/2021] 2 papers accepted at CVPR'21.
[11/2020] Announcing YouCook Leaderboard, a one-stop shop for YouCook2 info & leaderboard.
[10/2020] Recognized as top 10% of high-scoring reviewers at NeurIPS'21.
[05/2020] Videos from our CVPR'20 tutorial on Recent Advances in V+L are available!
[04/2020] My defense recording "Language-Driven Video Understanding" is on YouTube. Editing credit: Dan Newman.
[09/2019] Work with Prof. Justin Johnson on the first Deep Learning for Computer Vision course at UMich. Recordings.
|
Research
I'm interested in computer vision and its relations to natural language and deep learning, with a focus on learning visual
representation from multimodal supervision. Problems of interest include multimodal learning (e.g., captioning, grounding, VQA), video understanding,
unsupervised representation learning, generative models, and Transformers etc.
|
|
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou
CVPR, 2023
PDF
|
|
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng
NeurIPS, 2022
PDF /
Examples /
Covered by The Economist
|
|
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
NeruIPS, 2022
PDF /
Code
|
|
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan
NeurIPS, 2022
PDF
|
|
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
ECCV, 2022
PDF /
Code
|
|
DnA: Improving Few-shot Transfer Learning with Low-Rank Decomposition and Alignment
Ziyu Jiang, Tianlong Chen, Xuxi Chen, Yu Cheng, Luowei Zhou, Lu Yuan, Ahmed Awadallah, Zhangyang Wang
ECCV, 2022
PDF
|
|
BEVT: BERT Pretraining of Video Transformers
Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan
CVPR, 2022
PDF /
Code
|
|
CLIP-Event: Connecting Text and Images With Event Structures
Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang
CVPR, 2022 (Oral)
PDF /
Code
|
|
RegionCLIP: Region-Based Language-Image Pretraining
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao
CVPR, 2022
PDF /
Code
|
|
Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
Nathan Louis, Luowei Zhou, Steven J Yule, Roger D Dias, Milisa Manojlovich, Francis D Pagani, Donald S Likosky, Jason J. Corso
IJCARS, 2022
PDF /
Code
|
|
Florence: A New Foundation Model for Computer Vision
Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
arXiv, 2021
PDF /
Azure Blog /
XD's CVPR'22 Keynote /
Synced
|
|
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L Berg, Mohit Bansal, Jingjing Liu
Best Student Paper Honorable Mention award
CVPR, 2021 (Oral)
PDF /
Code
|
|
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu
CVPR, 2021
PDF /
Code
|
|
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
NeurIPS, 2021
PDF /
Benchmark /
ICCV'21 Challenge
|
|
Cluster-Former: Clustering-based Sparse Transformer for Question Answering
Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
ACL (Findings), 2021
PDF
|
|
Language-Driven Video Understanding
Luowei Zhou
Dissertation, 2020
PDF /
Defense Recording
|
|
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao
AAAI, 2020 (Spotlight)
PDF /
Code /
MSR Blog /
VentureBeat
|
|
Grounded Video Description
Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach
CVPR, 2019 (Oral)
PDF /
Code /
ActivityNet-Entities dataset /
CVPR'20 Challenge /
CVPR'21 Challenge
|
|
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition
Hao Huang, Luowei Zhou, Wei Zhang, Jason J. Corso, Chenliang Xu
BMVC, 2019
PDF
|
|
End-to-End Dense Video Captioning With Masked Transformer
Luowei Zhou*, Yingbo Zhou*, Jason J. Corso, Richard Socher, Caiming Xiong
CVPR, 2018 (Spotlight)
PDF /
Code
|
|
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou, Chenliang Xu, Jason J. Corso
AAAI, 2018 (Oral)
PDF /
Code /
YouCook2 dataset /
Leaderboard
|
|
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
Luowei Zhou, Nathan Louis, Jason J. Corso
BMVC, 2018
PDF /
Code /
YouCook2-BB dataset
|
|
Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer
Luowei Zhou, Pei Yang, Chunlin Chen, Yang Gao
Journal Impact Factor: 19.12
IEEE Transactions on Cybernetics, 2017
PDF /
Code
|
|
A Balanced Heuristic Mechanism for Multirobot Task Allocation of Intelligent Warehouses
Luowei Zhou, Yuanyuan Shi, Jiangliu Wang, Pei Yang
MPE, 2014
PDF
|
|