Wenhao Chai

About

Wenhao Chai

Wenhao Chai is a first-year Ph.D. Student in Computer Science at Princeton University. He previously studied at Stanford University, working with Prof. Christopher D. Manning. He has internship at Pika Labs and Microsoft Research Asia. His research spans a wide range of topics in computer vision and machine learning. He believes vision contains far more information and more opportunities for intelligence. He leads MovieChat, the first Large Mutlimodal Model and benchmark for hour-long video understanding. He is the leading member of LiveCodeBench Pro. His works have been recognized by MIT Technology Review. He has organized workshops and competitions at CVPR 2024 and 2025.

Check Out

News and Highlights

Chat. To junior master/undergraduate students: if you would like to chat about life, career plan, or research ideas related to AI/ML. I will dedicate at least 30 mins every week for such meetings. I encourage students from underrepresented groups to reach out. Also check this.
Join Discord. We are hosting Discord server among professors and students for daily sharing and research discussion.
Calendar. View my live-updated availability and upcoming events.

10/2025: Video-MMLU received the Outstanding Paper Award at ICCV 2025 Workshop @ Knowledge-Intensive Multimodal Reasoning with Travel Grant.
09/2025: One paper accepted by NeurIPS 2025, two papers accepted by NeurIPS 2025 Datasets and Benchmarks Track with one Oral.
09/2025: Invited talk at Abaka AI and 2077AI titled Better and Longer Video Understanding. Slides.
09/2025: Join Princeton University as a CS Ph.D. student. 2025 Fall application Record.
08/2025: One paper accepted by IEEE TPAMI.
08/2025: UniHPR received the Best Paper Award at IEEE MIPR 2025.
08/2025: LiveCodeBench Pro presented in Open AGI Symposium at University of California, Berkeley. Slides.
07/2025: Interviewed by DeepTech and MIT Technology Review China. Report.
06/2025: One paper accepted by ICCV 2025.
06/2025: Featured in MIT Technology Review as one of the lead authors of LiveCodeBench Pro.
06/2025: One paper accepted by IROS 2025.
05/2025: One paper accepted by ACL 2025.
04/2025: We host CVPR 2025 Video Understanding Challenge @ LOVEU sponsored by Lambda.
03/2025: Graduated from the University of Washington with a Master's thesis on Large Multimodal Models for Video Captioning, nominated for the Distinguished Thesis Award by the ECE Department.
02/2025: Three papers accepted by CVPR 2025.
01/2025: Two papers accepted by ICLR 2025.
12/2024: Two papers accepted by AAAI 2025.
07/2024: Two papers accepted by ECCV 2024.
06/2024: I work with Pika Labs as intern to develop next-generation video understanding and generation models.
04/2024: We host CVPR 2024 Long-form Video Understanding Challenge @ LOVEU.
04/2024: Invited talk at AgentX seminar about our STEVE series works.
02/2024: Two papers accepted by CVPR 2024 with one highlight (2.81%).
02/2024: Invited talk at AAAI 2024 workshop @ IMAGEOMICS.
12/2023: One paper accepted by AAAI 2024.
07/2023: Two papers accepted by ICCV 2023.

Recent

Projects

Preprint

October 2, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Sparse attention mechanism also works for video understanding. We introduce a native sparse attention approach that efficiently scales to longer video sequences while maintaining strong performance.

Project Page | Paper | Code | Model

Publication

June 13, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

A benchmark composed of problems from Codeforces, ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination. Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.

Publication

October 3, 2024

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

AuroraCap is a multimodal LLM designed for image and video detailed captioning. We also release VDC, the first benchmark for detailed video captioning, enabling comprehensive evaluation of video understanding models.

Preprint

September 30, 2024

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI is a zero-shot visual tracking framework that adapts Segment Anything Model (SAM) for visual tracking with motion-aware memory, enabling robust object tracking without requiring training data.

Project Page | Paper | Video | Raw Result | Code

Publication

July 31, 2023

MovieChat: From Dense Token to Sparse Memory in Long Video Understanding

MovieChat transforms dense video tokens into sparse memory representations for efficient long video understanding, enabling extended temporal reasoning while maintaining computational efficiency.

Publication

July 17, 2023

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

StableVideo presents a text-driven diffusion-based approach for consistent video editing, maintaining temporal coherence while enabling flexible content manipulation through natural language descriptions.

Project Page | Paper | Code

Explore

Calendar

Check my availability.

News

All news and updates record.

Junior FAQ

Schedule a meeting.

Mentoring

Research interns FAQ.

PhD Application Record

2025 Fall.

Conferences

Important conferences I will attend in person.

About

Wenhao Chai

Check Out

News and Highlights

Recent

Projects

Preprint

VideoNSA: Native Sparse Attention Scales Video Understanding

Publication

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Publication

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Preprint

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Publication

MovieChat: From Dense Token to Sparse Memory in Long Video Understanding

Publication

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Explore

More Pages

Calendar

News

Junior FAQ

Mentoring

PhD Application Record

Conferences