GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

1[Brown University]
2[ETH Zurich]
*[Equal Contributions]
[Teaser Figure]

We present GigaHands, a massive annotated bimanual hand activity dataset, unlocking new possibilities for animations, robotics and beyond. Each column above shows an activity sequence from the dataset. The bottom row show other annotations in the dataset including text, hand shape, object shape and pose (left half images). The right half images show novel views from dynamic radiance field fitting.


Abstract

Understanding bimanual human hand activities is a critical problem in AI and robotics. We cannot build large models of bimanual activities because existing datasets lack the scale, coverage of diverse hand activities, and detailed annotations. We introduce GigaHands, a massive annotated dataset capturing 34 hours of bimanual hand activities from 56 subjects and 417 objects, totaling 14k motion clips derived from 183 million frames paired with 84k text annotations. Our markerless capture setup and data acquisition protocol enable fully automatic 3D hand and object estimation while minimizing the effort required for text annotation. The scale and diversity of GigaHands enable broad applications, including text-driven action synthesis, hand motion captioning, and dynamic radiance field reconstruction.

(video contains audio)


Dataset Annotations

To be released soon

GigaHands is a diverse, massive, and fully-annotated 3D bimanual hand activities dataset. All sequences in GigaHands are fully annotated with: detailed activity text descriptions; 3D hand shape and pose; 3D object shape, pose and appearance; hand/object segmentation masks; 2D/3D hand keypoints; camera pose.


Application: Text-driven Motion Generation

We showcase text-driven motion generation enabled by training a generative model with GigaHands.


Application: Motion Captioning

Within Dataset We showcase 3D hand motion captioning by training a generative model with GigaHands.

In-the-Wild Dataset Using only GigaHands, we enable 3D hand motion captioning for other datasets.


Application: Dynamic 3D Scene Reconstruction

GigaHands provides hand motions captured from 51 camera views, enabling dynamic 3D scene reconstruction. The examples below showcase frame-wise 2DGS (Huang et al.) reconstruction results.


Citations

@misc{fu2024gigahandsmassiveannotateddataset,
      title={GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities}, 
      author={Rao Fu and Dingxi Zhang and Alex Jiang and Wanjia Fu and Austin Funk and Daniel Ritchie and Srinath Sridhar},
      year={2024},
      eprint={2412.04244},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.04244}, 
}

Acknowledgements

This research was supported by AFOSR grant FA9550-21-1-0214, NSF CAREER grant #2143576, and ONR DURIP grant N00014-23-1-2804. We would like to thank the OpenAI Research Access Program for API support and extend our gratitude to Ellie Pavlick, Tianran Zhang, Carmen Yu, Angela Xing, Chandradeep Pokhariya, Sudarshan Harithas, Hongyu Li, Chaerin Min, Xindi Qu, Xiaoquan Liu, Hao Sun, Melvin He and Brandon Woodard.

Contact

Rao Fu (contact email)