Blogs | Slides
Featured
Blogs and Slides

View Transformer Layers from Online Optimization Perspective
First posted: May 20, 2025
Last updated: May 21, 2025
In this blog, we cover mesa-optimization, test-time-training (TTT), and broad view of fast weight programming in transformer models.
PDF
[draft] Flow Matching Variant for Denoising Diffusion Codebook Models
Mar 16, 2025
In this blog, we introduce Denoising Diffusion Codebook Models (DDCM) and extend it to the flow matching framework.
PDF
[draft] Introducing muP
Mar 13, 2025
In this blog, we introduce muP (Maximal Update Parametrization), which aims at studying the transfer patterns of hyperparameters across model scales.
PDF
What is the Intrinsic Dimension of Your Data?
Jan 15, 2025
In this blog, we introduce the concept of intrinsic dimension and provide a method to estimate it. It is amazing that ImageNet has only 50 of the intrinsic dimension.
Slides