Related papers: Neural Mesh Refiner for 6-DoF Pose Estimation

Neural Mesh Refiner for 6-DoF Pose Estimation

URL: http://arxiv.org/abs/2003.07561v3
Date: Thu, 26 Mar 2020 10:14:40 GMT
Title: Neural Mesh Refiner for 6-DoF Pose Estimation
Authors: Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, and Rong Xiao
Abstract summary: Deep learning has shown to be effective for robust and real-time monocular pose estimation. This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh.
Score: 10.62836310872743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How can we effectively utilise the 2D monocular image information for recovering the 6D pose (6-DoF) of the visual objects? Deep learning has shown to be effective for robust and real-time monocular pose estimation. Oftentimes, the network learns to regress the 6-DoF pose using a naive loss function. However, due to a lack of geometrical scene understanding from the directly regressed pose estimation, there are misalignments between the rendered mesh from the 3D object and the 2D instance segmentation result, e.g., bounding boxes and masks prediction. This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh renderer. We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation. By leveraging geometry, we demonstrate that our technique significantly improves direct regression performance on the difficult task of translation estimation and achieve the state of the art results on Peking University/Baidu - Autonomous Driving dataset and the ApolloScape 3D Car Instance dataset. The code can be found at \url{https://bit.ly/2IRihfU}.

Related papers

Mask6D: Masked Pose Priors For 6D Object Pose Estimation [12.600659693194874]
We propose a pose estimation-specific pre-training strategy named Mask6D.<n>Our approach incorporates pose-aware 2D-3D correspondence maps and visible mask maps as additional modal information.<n>Our method outperforms previous end-to-end pose estimation methods.
arXiv Detail & Related papers (2025-07-09T02:06:49Z)
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation [9.760487761422326]
Estimating 2D-3D correspondences between RGB images and 3D space is a fundamental problem in 6D object pose estimation. Recent pose estimators use dense correspondence maps and Point-to-Point algorithms to estimate object poses. Recent advancements in image-to-image translation have led to diffusion models being the superior choice when evaluated on benchmarking datasets.
arXiv Detail & Related papers (2024-02-09T14:27:40Z)
Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information. We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z)
ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely. We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z)
NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation [44.42449011619408]
We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training. A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences. Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
arXiv Detail & Related papers (2022-03-09T15:28:02Z)
VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks. It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map. While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z)
GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator [51.89441403642665]
6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. This paper introduces a fully learning-based object pose estimator.
arXiv Detail & Related papers (2021-02-24T09:11:31Z)
Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose. We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.