ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes

1Robotics Research Center, IIIT-Hyderabad
2Stanford University
3Korea Advanced Institute of Science and Technology
4Brown University

Abstract

Progress in 3D object understanding has relied on manually "canonicalized" shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.

Overview


We introduce, ConDor a method for self-supervised category-level Canonicalization of the 3D pose of partial shapes. It consists of a neural network that is trained on an un-canonicalized collection of 3D point clouds with inconsistent 3D poses. During inference, our method takes a full or partial 3D point cloud of an object at an arbitrary pose, and outputs a canonical rotation frame and translation vector. To enable operation on instances from different categories, we build upon Tensor Field Networks (TFNs), a 3D point cloud architecture that is equivariant to 3D rotation and point permutation, and invariant to translation. To handle partial shapes, we use a two-branch (Siamese) network with training data that simulates partiality through shape slicing or camera projection. We introduce several losses to help our method learn to canonicalize 3D pose via self-supervision. A surprising feature of our method is the (optional) ability to learn consistent part co-segmentation across instances without any supervision.


Qualitative Results

Lights
Lights
Lights
Lights

Acknowledgements


This work was supported by AFOSR grant FA9550-21-1-0214, a Google Research Scholar Award, a Vannevar Bush Faculty Fellowship, ARL grant W911NF2120104, and gifts from the Adobe and Autodesk corporations.

BibTeX

@InProceedings{sajnani2022_condor,
author = {Rahul Sajnani and
               Adrien Poulenard and
               Jivitesh Jain and
               Radhika Dua and
               Leonidas J. Guibas and
               Srinath Sridhar},
title = {ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022}
}