Wenhao Chai

Graduate Student @
University of Washington

About

Wenhao Chai

Docs (updated on 03/19/2025): CV | Email
Research: Google Scholar Endpoint Badge | GitHub GitHub User's stars | Hugging Face
Social Media: Twitter | Instagram | LinkedIn | Zhihu | Xiaohongshu


Wenhao Chai is currently a graduate student at University of Washington, with Information Processing Lab advised by Prof. Jenq-Neng Hwang. Previously, he was an undergradate student at Zhejiang University, with CVNext Lab advised by Prof. Gaoang Wang. He is fortunate to work with Prof. Christopher D Manning at Stanford University, and have worked with Prof. Saining Xie and Prof. Yilun Du. He has internship at Pika Labs and Microsoft Research Asia. His research spans a wide range of topics in computer vision and deep learning. He has published related papers in top-tier conferences and journals such as ICLR, CVPR, ICCV, ECCV, and AAAI. He has also organized workshops and tutorials at CVPR and AAAI, and served as a reviewer for NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, COLM, AISTATS, and IJCV.


Check Out

News and Highlights

  • To junior master/undergraduate students: if you would like to chat about life, career plan, or research ideas related to AI/ML, feel free to send me zoom / google meet invitation via email (wchai [at] uw [dot] edu) to schedule a meeting. I will dedicate at least 30 mins every week for such meetings. It would be great if there were a resume or webpage for me to learn about you before our conversation! I encourage students from underrepresented groups to reach out.
  • We are hosting Discord server among professors and students for arXiv daily sharing and research discussion. Join us.
  • 2025 Fall CS Ph.D. application record.

  • I am making a schedule for CVPR 2025, from June 11th to June 15th, 2025, at the Music City Center in Nashville, TN. Message me if you'd like to join a road trip, coffee chat, or arrange a meal together.

  • 04/2025: We host CVPR 2025 Video Understanding Challenge @ LOVEU.
  • 03/2025: I graduate from University of Washington with the thesis of Large Multi-Model Models for Video Captioning.
  • 02/2025: Three papers accepted to CVPR 2025.
  • 01/2025: Two papers accepted to ICLR 2025.
  • 12/2024: Two papers accepted to AAAI 2025.
  • 07/2024: Two papers accepted to ECCV 2024.
  • 06/2024: One technique report accepted to CVPR 2024 workshop @ NTIRE.
  • 06/2024: I work with Pika Labs as intern to develop next-generation video understanding and generation models.
  • 05/2024: One paper accepted to CVPR 2024 workshop @ Embodied AI.
  • 04/2024: We host CVPR 2024 Long-form Video Understanding Challenge @ LOVEU.
  • 04/2024: Invited talk at AgentX seminar about our STEVE series works.
  • 03/2024: One paper accepted to ICLR 2024 workshop @ LLM Agents.
  • 02/2024: Two papers accepted to CVPR 2024 with one highlight (2.81%).
  • 02/2024: Invited talk at AAAI 2024 workshop @ IMAGEOMICS.
  • 12/2023: One paper accepted to AAAI 2024.
  • 09/2023: One paper accepted to ICCV 2023 workshop @ TNGCV-DataComp.
  • 07/2023: Two papers accepted to ICCV 2023.

View more

Recent

Projects

* Equal contribution.   Project lead.   Corresponding author.


AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jenq-Neng Hwang, Saining Xie, Christopher D. Manning
International Conference on Learning Representations (ICLR), 2025
Project Page | Paper | Video | Model | Benchmark | Leaderboard | Poster | Code

AuroraCap is a multimodal LLM designed for image and video detailed captioning. We also release VDC, the first benchmark for detailed video captioning.

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
arXiv preprint, 2024
Project Page | Paper | Video | Raw Result | Code

SAMURAI is a zero-shot visual tracking framework that adapts Segment Anything Model (SAM) for visual tracking with motion-aware memory.

StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
International Conference on Computer Vision (ICCV), 2023
Project Page | Paper | Video | Demo | Code

We tackle introduce temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects.

MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
Enxin Song*, Wenhao Chai*, Guanhong Wang*, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Computer Vision and Pattern Recognition (CVPR), 2024
Project Page | Paper | Blog | Video | Dataset | Leaderboard | Code NPM

MovieChat achieves state-of-the-art performace in extra long video (more than 10K frames) understanding by introducing memory mechanism.

Explore

More