Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose
Estimation
- URL: http://arxiv.org/abs/2203.14531v1
- Date: Mon, 28 Mar 2022 07:05:27 GMT
- Title: Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose
Estimation
- Authors: Xiaoke Jiang, Donghai Li, Hao Chen, Ye Zheng, Rui Zhao and Liwei Wu
- Abstract summary: State-of-the-art approaches typically use different backbones to extract features for RGB and depth images.
We find that the essential reason for using two independent backbones is the "projection breakdown" problem.
We propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input.
- Score: 21.424035166174352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As RGB-D sensors become more affordable, using RGB-D images to obtain
high-accuracy 6D pose estimation results becomes a better option.
State-of-the-art approaches typically use different backbones to extract
features for RGB and depth images. They use a 2D CNN for RGB images and a
per-pixel point cloud network for depth data, as well as a fusion network for
feature fusion. We find that the essential reason for using two independent
backbones is the "projection breakdown" problem. In the depth image plane, the
projected 3D structure of the physical world is preserved by the 1D depth value
and its built-in 2D pixel coordinate (UV). Any spatial transformation that
modifies UV, such as resize, flip, crop, or pooling operations in the CNN
pipeline, breaks the binding between the pixel value and UV coordinate. As a
consequence, the 3D structure is no longer preserved by a modified depth image
or feature. To address this issue, we propose a simple yet effective method
denoted as Uni6D that explicitly takes the extra UV data along with RGB-D
images as input. Our method has a Unified CNN framework for 6D pose estimation
with a single CNN backbone. In particular, the architecture of our method is
based on Mask R-CNN with two extra heads, one named RT head for directly
predicting 6D pose and the other named abc head for guiding the network to map
the visible points to their coordinates in the 3D model as an auxiliary module.
This end-to-end approach balances simplicity and accuracy, achieving comparable
accuracy with state of the arts and 7.2x faster inference speed on the
YCB-Video dataset.
Related papers
- Towards Two-view 6D Object Pose Estimation: A Comparative Study on
Fusion Strategy [16.65699606802237]
Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications.
This paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images.
arXiv Detail & Related papers (2022-07-01T08:22:34Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - Deep Camera Pose Regression Using Pseudo-LiDAR [1.5959408994101303]
We show that converting depth maps into pseudo-LiDAR signals is a better representation for camera localization tasks.
We propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
arXiv Detail & Related papers (2022-02-28T20:30:37Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [54.666329929930455]
We present FFB6D, a Bidirectional fusion network designed for 6D pose estimation from a single RGBD image.
We learn to combine appearance and geometry information for representation learning as well as output representation selection.
Our method outperforms the state-of-the-art by large margins on several benchmarks.
arXiv Detail & Related papers (2021-03-03T08:07:29Z) - Depth-Adapted CNN for RGB-D cameras [0.3727773051465455]
Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters.
We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras.
This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture.
arXiv Detail & Related papers (2020-09-21T15:58:32Z) - Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point
Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences.
Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - L6DNet: Light 6 DoF Network for Robust and Precise Object Pose
Estimation with Small Datasets [0.0]
We propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image.
We adopt a hybrid pipeline in two stages: data-driven and geometric.
Our approach is more robust and accurate than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-03T17:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.