Code | Datasets
Featured
Codebases






Featured
Datasets

Science T2I
We collect over 20k image pairs, enabling the training of a language-guided reward model for text-to-image alignment with scientific knowledge.
View more
VDC
The first benchmark for detailed video captioning, featuring over one thousand videos with significantly longer and more detailed captions.
View more
RT-Pose
A human pose estimation (HPE) dataset, consisting of calibrated radar ADC data, 4D radar tensors, stereo RGB images, and LiDAR point clouds.
View more
CityCraft
Including 2D semantic layouts of urban areas, corresponding satellite images, and high-quality 3D building assets.
View more
MovieChat
A manually labeled long video QA and caption dataset, contains 1,000 video, for each longer than ten thousands frames.
View more
VFD-2000
A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.
View more