Related papers: Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

URL: http://arxiv.org/abs/2404.15263v1
Date: Tue, 23 Apr 2024 17:55:05 GMT
Title: Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization
Authors: Lahav Lipson, Jia Deng,
Abstract summary: Multi-Session SLAM tracks camera motion across multiple disjoint videos. System can connect disjoint sequences, perform visual odometry, and global optimization.
Score: 20.88189708122356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new system for Multi-Session SLAM, which tracks camera motion across multiple disjoint videos under a single global reference. Our approach couples the prediction of optical flow with solver layers to estimate camera pose. The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. The full system can connect disjoint sequences, perform visual odometry, and global optimization. Compared to existing approaches, our design is accurate and robust to catastrophic failures. Code is available at github.com/princeton-vl/MultiSlam_DiffPose

Related papers

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information. We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z)
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation [11.611045114232187]
Recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance. We try to synthesize more virtual camera views by flow-based video frame making (VFI) For multi-frame inference, to sidestep the problem of dynamic objects encountered by explicit geometry-based methods like ManyDepth, we return to the feature fusion paradigm. We construct a unified self-supervised learning framework, named Mono-ViFI, to bilaterally connect single- and multi-frame depth.
arXiv Detail & Related papers (2024-07-19T08:51:51Z)
GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model. Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness. Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
FILM: Frame Interpolation for Large Motion [20.04001872133824]
We present a frame algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion. Our approach outperforms state-of-the-art methods on the Xiph large motion benchmark.
arXiv Detail & Related papers (2022-02-10T08:48:18Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)
6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties. We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)
Multi-object Monocular SLAM for Dynamic Environments [12.537311048732017]
The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene. Existing approaches solve restricted variants of the problem, but the solutions suffer relative scale ambiguity. We propose a multi pose-graph optimization formulation, to resolve the relative and absolute scale factor ambiguities involved.
arXiv Detail & Related papers (2020-02-10T03:49:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.