Related papers: Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments

Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments

URL: http://arxiv.org/abs/2304.07250v4
Date: Sun, 9 Jun 2024 17:57:45 GMT
Title: Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments
Authors: Felix Ott, Lucas Heublein, David Rügamer, Bernd Bischl, Christopher Mutschler,
Abstract summary: The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses. Recent advances in deep learning have enabled the localization using monocular visual cameras. This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods.
Score: 13.654208446015824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses. Recent advances in deep learning have enabled the localization using monocular visual cameras. While structure from motion (SfM) predicts the absolute pose from a point cloud, absolute pose regression (APR) methods learn a semantic understanding of the environment through neural networks. However, both fields face challenges caused by the environment such as motion blur, lighting changes, repetitive patterns, and feature-less structures. This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods. RPR methods suffer under different challenges, i.e., motion blur. The optical flow between consecutive images is computed using the Lucas-Kanade algorithm, and the relative pose is predicted using an auxiliary small recurrent convolutional network. The fusion of absolute and relative poses is a complex task due to the mismatch between the global and local coordinate systems. State-of-the-art methods fusing absolute and relative poses use pose graph optimization (PGO) to regularize the absolute pose predictions using relative poses. In this work, we propose recurrent fusion networks to optimally align absolute and relative pose predictions to improve the absolute pose prediction. We evaluate eight different recurrent units and construct a simulation environment to pre-train the APR and RPR networks for better generalized training. Additionally, we record a large database of different scenarios in a challenging large-scale indoor environment that mimics a warehouse with transportation robots. We conduct hyperparameter searches and experiments to show the effectiveness of our recurrent fusion method compared to PGO.

Related papers

VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques. We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z)
Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views. We propose a distributed representation of camera pose that treats a camera as a bundle of rays. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z)
Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference. In practice, however, the performance of RPRs is significantly degraded in unseen scenes. We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes. Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z)
Near-filed SAR Image Restoration with Deep Learning Inverse Technique: A Preliminary Study [5.489791364472879]
Near-field synthetic aperture radar (SAR) provides a high-resolution image of a target's scattering distribution-hot spots. Meanwhile, imaging result suffers inevitable degradation from sidelobes, clutters, and noises. To restore the image, current methods make simplified assumptions; for example, the point spread function (PSF) is spatially consistent, the target consists of sparse point scatters, etc. We reformulate the degradation model into a spatially variable complex-convolution model, where the near-field SAR's system response is considered. A model-based deep learning network is designed to restore the
arXiv Detail & Related papers (2022-11-28T01:28:33Z)
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression [6.557612703872671]
Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles. In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks. We show improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and handheld devices.
arXiv Detail & Related papers (2022-08-01T15:05:26Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images. Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z)
TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem. TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z)
Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration [67.69257782645789]
We propose piecewise transformation fields that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space. We show that fitting parametric models with poses by our network results in much better registration quality, especially for extreme poses.
arXiv Detail & Related papers (2021-04-16T15:16:09Z)
Zero-Shot Reinforcement Learning with Deep Attention Convolutional Neural Networks [12.282277258055542]
We show that a deep attention convolutional neural network (DACNN) with specific visual sensor configuration performs as well as training on a dataset with high domain and parameter variation at lower computational complexity. Our new architecture adapts perception with respect to the control objective, resulting in zero-shot learning without pre-training a perception network.
arXiv Detail & Related papers (2020-01-02T19:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.