About

Wenhao Chai

Docs (updated on 10/24/2025): Curriculum Vitae
Research: Google Scholar | GitHub GitHub User's stars
Social Media: X | Instagram | LinkedIn | Zhihu | Xiaohongshu


Wenhao Chai is a first-year Ph.D. Student in Computer Science at Princeton University, working with Prof. Zhuang Liu. He received his master's degree at University of Washington and bachelor's degree at Zhejiang University. He previously studied at Stanford University, working with Prof. Christopher D. Manning and at the University of Illinois Urbana-Champaign as a visiting scholar. He has internship at Pika Labs and Microsoft Research Asia. His research spans a wide range of topics in machine learning and computer vision. He leads the development of MovieChat, the first Large Mutlimodal Model and benchmark for hour-long video understanding. He is the core member of LiveCodeBench Pro Team. His works have been recognized by MIT Technology Review. He has organized workshops and challenges on video understanding at CVPR 2024 and 2025.

Check Out

News and Highlights

View more

  • FAQ. To junior master/undergraduate students: if you would like to chat about life, career plan, or research ideas related to AI/ML. I will dedicate at least 30 mins every week for such meetings. I encourage students from underrepresented groups to reach out.
  • Internship. We are looking for interns at Princeton University. Please refer to the content in the link for more details. We value long-term project and development over short-term paper output. We want to help you grow into real researchers, not engineers who might be replaced by Claude Code.
  • Join Discord. We are hosting Discord server among professors and students for arXiv logo daily sharing and research discussion.
  • Calendar. View my live-updated availability and upcoming events.

  • 10/2025: Video-MMLU received the Outstanding Paper Award at ICCV 2025 Workshop @ Knowledge-Intensive Multimodal Reasoning with Travel Grant.
  • 09/2025: One paper accepted by NeurIPS 2025, two papers accepted by NeurIPS 2025 Datasets and Benchmarks Track with one Oral.
  • 09/2025: Invited talk at Abaka AI and 2077AI titled Better and Longer Video Understanding. Slides.
  • 09/2025: Join Princeton University as a CS Ph.D. student, working with Prof. Zhuang Liu. 2025 Fall application Record.
  • 08/2025: One paper accepted by IEEE TPAMI.
  • 08/2025: UniHPR received the Best Paper Award at IEEE MIPR 2025.
  • 08/2025: LiveCodeBench Pro presented in Open AGI Symposium at University of California, Berkeley. Slides.
  • 07/2025: Interviewed by DeepTech and MIT Technology Review China. Report.
  • 06/2025: One paper accepted by ICCV 2025.
  • 06/2025: Featured in MIT Technology Review as one of the lead authors of LiveCodeBench Pro.
  • 06/2025: One paper accepted by IROS 2025.
  • 05/2025: One paper accepted by ACL 2025.
  • 04/2025: We host CVPR 2025 Video Understanding Challenge @ LOVEU sponsored by Lambda.
  • 03/2025: Graduated from the University of Washington with a Master's thesis on Large Multimodal Models for Video Captioning, nominated for the Distinguished Thesis Award by the ECE Department.
  • 02/2025: Three papers accepted by CVPR 2025.
  • 01/2025: Two papers accepted by ICLR 2025.
  • 12/2024: Two papers accepted by AAAI 2025.
  • 07/2024: Two papers accepted by ECCV 2024.
  • 06/2024: I work with Pika Labs as intern to develop next-generation video understanding and generation models.
  • 04/2024: We host CVPR 2024 Long-form Video Understanding Challenge @ LOVEU.
  • 04/2024: Invited talk at AgentX seminar about our STEVE series works.
  • 02/2024: Two papers accepted by CVPR 2024 with one highlight (2.81%).
  • 02/2024: Invited talk at AAAI 2024 workshop @ IMAGEOMICS.
  • 12/2023: One paper accepted by AAAI 2024.
  • 07/2023: Two papers accepted by ICCV 2023.

Recent

Projects

Preprint

October 2, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Sparse attention mechanism also works for video understanding. We introduce a native sparse attention approach that efficiently scales to longer video sequences while maintaining strong performance.

Project Page | Paper | Code | Model

Publication

June 13, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

A benchmark composed of problems from Codeforces, ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination. Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.

Project Page | Paper | Data | MIT Technology Review | Code

Publication

October 3, 2024

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

AuroraCap is a multimodal LLM designed for image and video detailed captioning. We also release VDC, the first benchmark for detailed video captioning, enabling comprehensive evaluation of video understanding models.

Project Page | Paper | Video | Model | Benchmark | Leaderboard | Poster | Code

Preprint

September 30, 2024

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI is a zero-shot visual tracking framework that adapts Segment Anything Model (SAM) for visual tracking with motion-aware memory, enabling robust object tracking without requiring training data.

Project Page | Paper | Video | Raw Result | Code

Publication

July 31, 2023

MovieChat: From Dense Token to Sparse Memory in Long Video Understanding

MovieChat transforms dense video tokens into sparse memory representations for efficient long video understanding, enabling extended temporal reasoning while maintaining computational efficiency.

Project Page | Paper | Blog | Video | Dataset | Leaderboard | Code NPM

Publication

July 17, 2023

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

StableVideo presents a text-driven diffusion-based approach for consistent video editing, maintaining temporal coherence while enabling flexible content manipulation through natural language descriptions.

Project Page | Paper | Code