Large-scale Cooking Video Dataset for Procedure Learning and Recipe Generation
We collected YouCook2, which contains 2000 long videos with temporally localized recipe sentence annotations. We first decompose the recipe generation task into two sub-tasks: event proposal and video segment captioning. The event proposal network temporally localizes procedure steps in web instructional videos which are each described by our self-attention-based video captioning module in a natural language sentence. Finally, we bridged the event proposal module and the captioning module by differentiable visual mask and achieved SotA results. Later works include weakly supervised visual grounding from language description.
Visual Captioning with Text/Audio-guided Long Short-Term Memory
We proposed a LSTM-based image captioning method though explicitly text-conditional semantic attention, and the proposed method achieves state-of-the-art performance. The paper is available on Arxiv and the source code is on Github. We also applied this method for video captioning by leveraging audio data for semantic guidance, and participated in 2016 ACM Multimedia Video-to-Text Challenge. Currently, I am working on instructional video temporal segmentation and captioning based on deep neural networks. We collected a large-scale dataset for instructional video analysis, named YouCookII dataset, which will be publicly available soon.
Multi-agent RL by Negotiation and Knowledge Transfer (undergrad thesis)
We introduced a new algorithm to solve multi-agent reinforcement learning (MARL) problems, named negotiation-based MARL with sparse interactions (NegoSI). In contrast to traditional sparse-interaction based MARL algorithms, the proposed method adopts the equilibrium concept, making it possible for agents to select equilibrium strategy for their joint
action. We first tested NegoSI with three other state-of-the-art algorithms under six benchmarks, and then put it into practice and tested it on our intelligent warehouse simulation platform. The results were published on IEEE Transactions on Cybernetics.