Blogs | Slides

Featured

Blogs and Slides

Better and Longer Video Understanding talk thumbnail
Better and Longer Video Understanding

Aug 25, 2025

In this talk, we highlight the shift from traditional single-task models to efficient LLM-based systems for long, detailed, and knowledge-intensive video understanding.

Slides
Neural ODE blog post thumbnail
ODE Perspective on Neural Networks

Jul 26, 2025

In this blog, we cover the neural ODE perspective in terms of optimization and architecture design.

PDF
MESA blog post thumbnail
View Transformer Layers from Online Optimization Perspective

First posted: May 20, 2025
Last updated: Jul 17, 2025

In this blog, we cover mesa-optimization, test-time-training (TTT), and broad view of fast weight programming in transformer models.

PDF
DDCM blog post thumbnail
[draft] Flow Matching Variant for Denoising Diffusion Codebook Models

Mar 16, 2025

In this blog, we introduce Denoising Diffusion Codebook Models (DDCM) and extend it to the flow matching framework.

PDF
MuP blog post thumbnail
[draft] Introducing muP

Mar 13, 2025

In this blog, we introduce muP (Maximal Update Parametrization), which aims at studying the transfer patterns of hyperparameters across model scales.

PDF
Data Dimension blog post thumbnail
What is the Intrinsic Dimension of Your Data?

Jan 15, 2025

In this blog, we introduce the concept of intrinsic dimension and provide a method to estimate it. It is amazing that ImageNet has only 50 of the intrinsic dimension.

Slides
Bridge blog post thumbnail
Bridging the Parallel Decoding of LLMs with the Diffusion Process

Oct 30, 2024

In this blog, we introduce Jacobi Decoding, a parallel decoding algorithm for LLMs and its connection to the diffusion process in terms of high-level concepts.

English
Chinese