Neural Mesh Refiner for 6-DoF Pose Estimation
- URL: http://arxiv.org/abs/2003.07561v3
- Date: Thu, 26 Mar 2020 10:14:40 GMT
- Title: Neural Mesh Refiner for 6-DoF Pose Estimation
- Authors: Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, and Rong
Xiao
- Abstract summary: Deep learning has shown to be effective for robust and real-time monocular pose estimation.
This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh.
- Score: 10.62836310872743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can we effectively utilise the 2D monocular image information for
recovering the 6D pose (6-DoF) of the visual objects? Deep learning has shown
to be effective for robust and real-time monocular pose estimation. Oftentimes,
the network learns to regress the 6-DoF pose using a naive loss function.
However, due to a lack of geometrical scene understanding from the directly
regressed pose estimation, there are misalignments between the rendered mesh
from the 3D object and the 2D instance segmentation result, e.g., bounding
boxes and masks prediction. This paper bridges the gap between 2D mask
generation and 3D location prediction via a differentiable neural mesh
renderer. We utilise the overlay between the accurate mask prediction and less
accurate mesh prediction to iteratively optimise the direct regressed 6D pose
information with a focus on translation estimation. By leveraging geometry, we
demonstrate that our technique significantly improves direct regression
performance on the difficult task of translation estimation and achieve the
state of the art results on Peking University/Baidu - Autonomous Driving
dataset and the ApolloScape 3D Car Instance dataset. The code can be found at
\url{https://bit.ly/2IRihfU}.
Related papers
- Improving 2D-3D Dense Correspondences with Diffusion Models for 6D
Object Pose Estimation [9.760487761422326]
Estimating 2D-3D correspondences between RGB images and 3D space is a fundamental problem in 6D object pose estimation.
Recent pose estimators use dense correspondence maps and Point-to-Point algorithms to estimate object poses.
Recent advancements in image-to-image translation have led to diffusion models being the superior choice when evaluated on benchmarking datasets.
arXiv Detail & Related papers (2024-02-09T14:27:40Z) - Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - NeRF-Pose: A First-Reconstruct-Then-Regress Approach for
Weakly-supervised 6D Object Pose Estimation [44.42449011619408]
We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training.
A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences.
Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
arXiv Detail & Related papers (2022-03-09T15:28:02Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.