Download
Featured
Codebases






Featured
Datasets

Science T2I @ CVPR 2025
We collect over 20k image pairs, enabling the training of a language-guided reward model for text-to-image alignment with scientific knowledge.
View more
VDC @ ICLR 2025
The first benchmark for detailed video captioning, featuring over one thousand videos with significantly longer and more detailed captions.
View more
RT-Pose @ ECCV 2024
A human pose estimation (HPE) dataset, consisting of calibrated radar ADC data, 4D radar tensors, stereo RGB images, and LiDAR point clouds.
View more
CityCraft
Including 2D semantic layouts of urban areas, corresponding satellite images, and high-quality 3D building assets.
View more
MovieChat @ CVPR 2024
A manually labeled long video QA and caption dataset, contains 1,000 video, for each longer than ten thousands frames.
View more
VFD-2000
A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.
View moreFeatured
Surveys
Featured
Templates
Featured
Posters
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
ICLR 2025, Singapore
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
ACM MM 2024, Melbourne, Australia
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
CVPR 2024, Seattle, WA
STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft
CVPR 2024 workshop, Seattle, WA
UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
AAAI 2024, Vancouver, Canada
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
ICCV 2023, Paris, France
PAD: Personalized Alignment of LLMs at Decoding-Time
ICLR 2025, Singapore
See and Think: Embodied Agent in Virtual Environment
ECCV 2024, Milano, Italy
Learning Diffusion Texture Priors for Image Restoration
CVPR 2024, Seattle, WA
Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
ICLR 2024 workshop, Vienna, Austria
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
WACV 2024, Waikoloa, Hawaii
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
ICCV 2023, Paris, France