Blogs | Slides
Featured
Blogs and Slides

Better and Longer Video Understanding
Aug 25, 2025
In this talk, we highlight the shift from traditional single-task models to efficient LLM-based systems for long, detailed, and knowledge-intensive video understanding.
Slides
ODE Perspective on Neural Networks
Jul 26, 2025
In this blog, we cover the neural ODE perspective in terms of optimization and architecture design.
PDF
View Transformer Layers from Online Optimization Perspective
First posted: May 20, 2025
Last updated: Jul 17, 2025
In this blog, we cover mesa-optimization, test-time-training (TTT), and broad view of fast weight programming in transformer models.
PDF
[draft] Flow Matching Variant for Denoising Diffusion Codebook Models
Mar 16, 2025
In this blog, we introduce Denoising Diffusion Codebook Models (DDCM) and extend it to the flow matching framework.
PDF
[draft] Introducing muP
Mar 13, 2025
In this blog, we introduce muP (Maximal Update Parametrization), which aims at studying the transfer patterns of hyperparameters across model scales.
PDF
What is the Intrinsic Dimension of Your Data?
Jan 15, 2025
In this blog, we introduce the concept of intrinsic dimension and provide a method to estimate it. It is amazing that ImageNet has only 50 of the intrinsic dimension.
Slides