Self-Improving SLAM in Dynamic Environments: Learning When to Mask
- URL: http://arxiv.org/abs/2210.08350v1
- Date: Sat, 15 Oct 2022 18:06:06 GMT
- Title: Self-Improving SLAM in Dynamic Environments: Learning When to Mask
- Authors: Adrian Bojko, Romain Dupont, Mohamed Tamaazousti, Herv\'e Le Borgne
- Abstract summary: We propose a novel SLAM that learns when masking objects improves its performance in dynamic scenarios.
We do not make any priors on motion: our method learns to mask moving objects by itself.
Our method reaches the state of the art on the TUM RGB-D dataset and outperforms it on KITTI and ConsInv datasets.
- Score: 5.4310785842119795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual SLAM -- Simultaneous Localization and Mapping -- in dynamic
environments typically relies on identifying and masking image features on
moving objects to prevent them from negatively affecting performance. Current
approaches are suboptimal: they either fail to mask objects when needed or, on
the contrary, mask objects needlessly. Thus, we propose a novel SLAM that
learns when masking objects improves its performance in dynamic scenarios.
Given a method to segment objects and a SLAM, we give the latter the ability of
Temporal Masking, i.e., to infer when certain classes of objects should be
masked to maximize any given SLAM metric. We do not make any priors on motion:
our method learns to mask moving objects by itself. To prevent high annotations
costs, we created an automatic annotation method for self-supervised training.
We constructed a new dataset, named ConsInv, which includes challenging
real-world dynamic sequences respectively indoors and outdoors. Our method
reaches the state of the art on the TUM RGB-D dataset and outperforms it on
KITTI and ConsInv datasets.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments [9.706447888754614]
We present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments.
We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas.
We also introduce a selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects.
arXiv Detail & Related papers (2024-01-02T12:35:03Z) - MaskFlow: Object-Aware Motion Estimation [0.45646200630189254]
We introduce a novel motion estimation method, MaskFlow, that is capable of estimating accurate motion fields.
In addition to lower-level features, that are used in other Deep Neural Network (DNN)-based motion estimation methods, MaskFlow draws from object-level features and segmentations.
arXiv Detail & Related papers (2023-11-21T09:37:49Z) - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual
Mask Annotations [86.47908754383198]
Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.
Our method generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs.
Our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset.
arXiv Detail & Related papers (2023-03-29T17:58:39Z) - Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition [51.67493993845143]
We reconstruct a neural volume that captures time-varying color, density, scene flow, semantics, and attention information.
The semantics and attention let us identify salient foreground objects separately from the background across spacetime.
We show that this method can decompose dynamic scenes in an unsupervised way with competitive performance to a supervised method.
arXiv Detail & Related papers (2023-03-02T19:00:05Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - TwistSLAM: Constrained SLAM in Dynamic Environment [0.0]
We present TwistSLAM, a semantic, dynamic, stereo SLAM system that can track dynamic objects in the scene.
Our algorithm creates clusters of points according to their semantic class.
It uses the static parts of the environment to robustly localize the camera and tracks the remaining objects.
arXiv Detail & Related papers (2022-02-24T22:08:45Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Learning to Segment Dynamic Objects using SLAM Outliers [5.4310785842119795]
We present a method to automatically learn to segment dynamic objects using SLAM outliers.
It requires only one monocular sequence per dynamic object for training and consists in localizing dynamic objects using SLAM outliers, creating their masks, and using these masks to train a semantic segmentation network.
arXiv Detail & Related papers (2020-11-12T08:36:54Z) - DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects.
To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error.
Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.