DeepRM: Deep Recurrent Matching for 6D Pose Refinement
- URL: http://arxiv.org/abs/2205.14474v5
- Date: Fri, 16 Jun 2023 20:26:55 GMT
- Title: DeepRM: Deep Recurrent Matching for 6D Pose Refinement
- Authors: Alexander Avery, Andreas Savakis
- Abstract summary: DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
- Score: 77.34726150561087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise 6D pose estimation of rigid objects from RGB images is a critical but
challenging task in robotics, augmented reality and human-computer interaction.
To address this problem, we propose DeepRM, a novel recurrent network
architecture for 6D pose refinement. DeepRM leverages initial coarse pose
estimates to render synthetic images of target objects. The rendered images are
then matched with the observed images to predict a rigid transform for updating
the previous pose estimate. This process is repeated to incrementally refine
the estimate at each iteration. The DeepRM architecture incorporates LSTM units
to propagate information through each refinement step, significantly improving
overall performance. In contrast to current 2-stage Perspective-n-Point based
solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can
be tuned via a single parameter for accuracy and efficiency. During training, a
multi-scale optical flow head is added to predict the optical flow between the
observed and synthetic images. Optical flow prediction stabilizes the training
process, and enforces the learning of features that are relevant to the task of
pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art
performance on two widely accepted challenging datasets.
Related papers
- Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior [3.2120448116996103]
This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors.
In the first stage, a water segmentation network is performed to separate the reflection components from the entire image.
The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation.
arXiv Detail & Related papers (2024-04-10T17:25:42Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - TransPose: A Transformer-based 6D Object Pose Estimation Network with
Depth Refinement [5.482532589225552]
We propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module.
The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images.
A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose.
arXiv Detail & Related papers (2023-07-09T17:33:13Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Near-filed SAR Image Restoration with Deep Learning Inverse Technique: A
Preliminary Study [5.489791364472879]
Near-field synthetic aperture radar (SAR) provides a high-resolution image of a target's scattering distribution-hot spots.
Meanwhile, imaging result suffers inevitable degradation from sidelobes, clutters, and noises.
To restore the image, current methods make simplified assumptions; for example, the point spread function (PSF) is spatially consistent, the target consists of sparse point scatters, etc.
We reformulate the degradation model into a spatially variable complex-convolution model, where the near-field SAR's system response is considered.
A model-based deep learning network is designed to restore the
arXiv Detail & Related papers (2022-11-28T01:28:33Z) - Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein
GANs [1.0499611180329802]
Real-time estimation of actual environment depth is an essential module for various autonomous system tasks.
In this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis.
arXiv Detail & Related papers (2021-03-31T09:43:38Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z) - se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image
Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking.
It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.
The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.