Princeton University
Author 2
Princeton University
Author 3
Princeton University
Author 4
Princeton University
Video-MMLU pushes LMMs to the limits—can the model really understand real-world lectures?

Section One
Example One (Image Top)
This is an example of content that can be displayed in a card group layout. Card groups have equal height columns with no gutters.
Example Two (Image Bottom)
This is an example of content that can be displayed in a card group layout. Card groups have equal height columns with no gutters.
FAQs
Video-MMLU is a benchmark designed to evaluate large multimodal models on their ability to understand and reason about real-world lecture videos across multiple domains and disciplines.
Unlike text-only or image-only benchmarks, Video-MMLU specifically tests comprehension of educational video content, requiring models to integrate visual, auditory, and temporal information to answer challenging questions.
The benchmark dataset and evaluation code are available through our GitHub repository. Researchers can use it to test their models and compare results with existing baselines.