DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via
Deformable Template Field
- URL: http://arxiv.org/abs/2308.02239v1
- Date: Fri, 4 Aug 2023 10:35:40 GMT
- Title: DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via
Deformable Template Field
- Authors: Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong
Liu, Feifei Feng, Yakun Huang, Xiuquan Qiao, Jian Tang
- Abstract summary: Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging.
We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
- Score: 29.42222066097076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 6D poses and reconstructing 3D shapes of objects in open-world
scenes from RGB-depth image pairs is challenging. Many existing methods rely on
learning geometric features that correspond to specific templates while
disregarding shape variations and pose differences among objects in the same
category. As a result, these methods underperform when handling unseen object
instances in complex environments. In contrast, other approaches aim to achieve
category-level estimation and reconstruction by leveraging normalized geometric
structure priors, but the static prior-based reconstruction struggles with
substantial intra-class variations. To solve these problems, we propose the
DTF-Net, a novel framework for pose estimation and shape reconstruction based
on implicit neural fields of object categories. In DTF-Net, we design a
deformable template field to represent the general category-wise shape latent
features and intra-category geometric deformation features. The field
establishes continuous shape correspondences, deforming the category template
into arbitrary observed instances to accomplish shape reconstruction. We
introduce a pose regression module that shares the deformation features and
template codes from the fields to estimate the accurate 6D pose of each object
in the scene. We integrate a multi-modal representation extraction module to
extract object features and semantic masks, enabling end-to-end inference.
Moreover, during training, we implement a shape-invariant training strategy and
a viewpoint sampling method to further enhance the model's capability to
extract object pose features. Extensive experiments on the REAL275 and CAMERA25
datasets demonstrate the superiority of DTF-Net in both synthetic and real
scenes. Furthermore, we show that DTF-Net effectively supports grasping tasks
with a real robot arm.
Related papers
- Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation [26.982199143972835]
We introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation.
Our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.
arXiv Detail & Related papers (2024-03-19T13:43:27Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - DeFormer: Integrating Transformers with Deformable Models for 3D Shape
Abstraction from a Single Image [31.154786931081087]
We propose a novel bi-channel Transformer architecture, integrated with parameterized deformable models, to simultaneously estimate the global and local deformations of primitives.
DeFormer achieves better reconstruction accuracy over the state-of-the-art, and visualizes with consistent semantic correspondences for improved interpretability.
arXiv Detail & Related papers (2023-09-22T02:46:43Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object
Pose Estimation [30.04752448942084]
Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.
We propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.
arXiv Detail & Related papers (2021-10-30T06:46:44Z) - Multi-Category Mesh Reconstruction From Image Collections [90.24365811344987]
We present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture.
Our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision.
Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
arXiv Detail & Related papers (2021-10-21T16:32:31Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.