DProST: 6-DoF Object Pose Estimation Using Space Carving and Dynamic
Projective Spatial Transformer
- URL: http://arxiv.org/abs/2112.08775v1
- Date: Thu, 16 Dec 2021 10:39:09 GMT
- Title: DProST: 6-DoF Object Pose Estimation Using Space Carving and Dynamic
Projective Spatial Transformer
- Authors: Jaewoo Park, Nam Ik Cho
- Abstract summary: Most deep learning-based pose estimation methods require CAD data to use 3D intermediate representations or project 2D appearance.
We propose a new pose estimation system consisting of a space carving module that reconstructs a reference 3D feature to replace the CAD data.
Also, we overcome the self-occlusion problem by a new Bidirectional Z-buffering (BiZ-buffer) method, which extracts both the front view and the self-occluded back view of the object.
- Score: 20.291172201922084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the pose of an object is a core computer vision task. Most deep
learning-based pose estimation methods require CAD data to use 3D intermediate
representations or project 2D appearance. However, these methods cannot be used
when CAD data for objects of interest are unavailable. Besides, the existing
methods did not precisely reflect the perspective distortion to the learning
process. In addition, information loss due to self-occlusion has not been
studied well. In this regard, we propose a new pose estimation system
consisting of a space carving module that reconstructs a reference 3D feature
to replace the CAD data. Moreover, Our new transformation module, Dynamic
Projective Spatial Transformer (DProST), transforms a reference 3D feature to
reflect the pose while considering perspective distortion. Also, we overcome
the self-occlusion problem by a new Bidirectional Z-buffering (BiZ-buffer)
method, which extracts both the front view and the self-occluded back view of
the object. Lastly, we suggest a Perspective Grid Distance Loss (PGDL),
enabling stable learning of the pose estimator without CAD data. Experimental
results show that our method outperforms the state-of-the-art method on the
LINEMOD dataset and comparable performance on LINEMOD-OCCLUSION dataset even
compared to the methods that require CAD data in network training.
Related papers
- Object Gaussian for Monocular 6D Pose Estimation from Sparse Views [4.290993205307184]
We introduce SGPose, a novel framework for sparse view object pose estimation using Gaussian-based methods.
Given as few as ten views, SGPose generates a geometric-aware representation by starting with a random cuboid.
Experiments on typical benchmarks, especially on the Occlusion LM-O dataset, demonstrate that SGPose outperforms existing methods even under sparse view constraints.
arXiv Detail & Related papers (2024-09-04T10:03:11Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views
with Learnt Shape Programs [24.09764733540401]
We develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models.
We leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output.
Our method significantly outperforms existing ones when the inputs are noisy or incomplete.
arXiv Detail & Related papers (2023-08-10T17:59:34Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Self-supervised Wide Baseline Visual Servoing via 3D Equivariance [35.93323183558956]
This paper presents a novel self-supervised visual servoing method for wide baseline images.
Existing approaches that regress absolute camera pose with respect to an object require 3D ground truth data of the object.
Ours yields more than 35% average distance error reduction and more than 90% success rate with 3cm error tolerance.
arXiv Detail & Related papers (2022-09-12T17:38:26Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.