Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation
- URL: http://arxiv.org/abs/2403.12728v1
- Date: Tue, 19 Mar 2024 13:43:27 GMT
- Title: Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation
- Authors: Jingtao Sun, Yaonan Wang, Mingtao Feng, Chao Ding, Mike Zheng Shou, Ajmal Saeed Mian,
- Abstract summary: We introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation.
Our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.
- Score: 26.982199143972835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully-supervised category-level pose estimation aims to determine the 6-DoF poses of unseen instances from known categories, requiring expensive mannual labeling costs. Recently, various self-supervised category-level pose estimation methods have been proposed to reduce the requirement of the annotated datasets. However, most methods rely on synthetic data or 3D CAD model for self-supervised training, and they are typically limited to addressing single-object pose problems without considering multi-objective tasks or shape reconstruction. To overcome these challenges and limitations, we introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation, only leveraging the shape priors. Specifically, to capture the SE(3)-equivariant pose features and 3D scale-invariant shape information, we present a Prior-Aware Pyramid 3D Point Transformer in our network. This module adopts a point convolutional layer with radial-kernels for pose-aware learning and a 3D scale-invariant graph convolution layer for object-level shape representation, respectively. Furthermore, we introduce a pretrain-to-refine self-supervised training paradigm to train our network. It enables proposed network to capture the associations between shape priors and observations, addressing the challenge of intra-class shape variations by utilising the diffusion mechanism. Extensive experiments conducted on four public datasets and a self-built dataset demonstrate that our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.
Related papers
- 3D Shape Completion on Unseen Categories:A Weakly-supervised Approach [61.76304400106871]
We introduce a novel weakly-supervised framework to reconstruct the complete shapes from unseen categories.
We first propose an end-to-end prior-assisted shape learning network that leverages data from the seen categories to infer a coarse shape.
In addition, we propose a self-supervised shape refinement model to further refine the coarse shape.
arXiv Detail & Related papers (2024-01-19T09:41:09Z) - A Fusion of Variational Distribution Priors and Saliency Map Replay for
Continual 3D Reconstruction [1.3812010983144802]
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images.
This task requires significant data acquisition to predict both visible and occluded portions of the shape.
We propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.
arXiv Detail & Related papers (2023-08-17T06:48:55Z) - DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via
Deformable Template Field [29.42222066097076]
Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging.
We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
arXiv Detail & Related papers (2023-08-04T10:35:40Z) - Weakly-supervised 3D Pose Transfer with Keypoints [57.66991032263699]
Main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies.
We propose a novel weakly-supervised keypoint-based framework to overcome these difficulties.
arXiv Detail & Related papers (2023-07-25T12:40:24Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Towards Self-Supervised Category-Level Object Pose and Size Estimation [121.28537953301951]
This work presents a self-supervised framework for category-level object pose and size estimation from a single depth image.
We leverage the geometric consistency residing in point clouds of the same shape for self-supervision.
arXiv Detail & Related papers (2022-03-06T06:02:30Z) - Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object
Pose Estimation [30.04752448942084]
Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.
We propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.
arXiv Detail & Related papers (2021-10-30T06:46:44Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.