Dex1B

Learning with 1B Demonstrations for Dexterous Manipulation

Robotics: Science and Systems (RSS) 2025

Jianglong Ye^, Keyi Wang^, Chengjing Yuan, Ruihan Yang, Yiquan Li, Jiyue Zhu,
Yuzhe Qin, Xueyan Zou, Xiaolong Wang

Paper

arXiv

Video

Summary

Dex1B is a large-scale, diverse, and high-quality demonstration dataset generated using generative models. It contains one billion demonstrations for two fundamental tasks: grasping🖐️ and articulation💻.

To construct it, we propose a generative model named DexSimple, which integrates geometric constraints to improve feasibility✅and incorporates additional conditions to enhance diversity🌐. Its effectiveness is demonstrated through both simulation benchmarks and real-world robot experiments.

Dex1B is a large-scale, diverse, and high-quality demonstration dataset generated using generative models. It contains one billion demonstrations for two fundamental tasks: grasping🖐️ and articulation💻.

To construct it, we propose a generative model named DexSimple, which integrates geometric constraints to improve feasibility✅and incorporates additional conditions to enhance diversity🌐. Its effectiveness is demonstrated through both simulation benchmarks and real-world robot experiments.

Technical Summary Video

Manipulation Demonstrations

Dex1B leverages simulation and generative models to construct a billion-scale demonstration dataset for grasping🖐️ and articulation💻.

* Click "grasping" or "articulation" above to show the corresponding visualizations.

For grasping, we construct 1 million scenes using object assets from Objaverse. For articulation, we construct scenes using object assets from PartNet-Mobility. We use optimization techniques to construct seed datasets with a few demonstrations, and then employ DexSimple to expand the datasets to billion scale. All demonstrations are validated using the ManiSkill/SAPIEN simulator.

Real-World Deployment

We deploy the DexSimple policy in a zero-shot sim-to-real fashion. In the video above, we mount the Ability Hand onto an XArm and place a calibrated RealSense camera in a third-person view. As shown, the grasping motions are smooth and stable, and the policy predicts diverse grasps for different objects.

Spatial Generalizability

We showcase the spatial generalizability of our policy in the videos above. The object is placed at various locations sequentially and is successfully grasped.

Iterative Data Generation

We first utilize an optimization algorithm to generate the Seed dataset. The Seed dataset is then used as training data for DexSimple. DexSimple is subsequently used to sample a scaled proposal dataset. The scaled dataset is validated using simulation. We intentionally debias the hand poses and objects to increase diversity.

DexSimple Model

We condition the CVAE model with hand parameters as well as local object point features. The hand parameters are associated with the local object point. The model is supervised using both standard MSE and KL loss, along with an approximate SDF loss to enforce geometric constraints.

BibTeX

@inproceedings{ye2025dex1b,
  title={Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation},
  author={Ye, Jianglong and Wang, Keyi and Yuan, Chengjing and Yang, Ruihan and Li, Yiquan and Zhu, Jiyue and Qin, Yuzhe and Zou, Xueyan and Wang, Xiaolong},
  booktitle={Robotics: Science and Systems (RSS)},
  year={2025}
}