Related papers: Self-Improving SLAM in Dynamic Environments: Learning When to Mask

Self-Improving SLAM in Dynamic Environments: Learning When to Mask

URL: http://arxiv.org/abs/2210.08350v1
Date: Sat, 15 Oct 2022 18:06:06 GMT
Title: Self-Improving SLAM in Dynamic Environments: Learning When to Mask
Authors: Adrian Bojko, Romain Dupont, Mohamed Tamaazousti, Herv\'e Le Borgne
Abstract summary: We propose a novel SLAM that learns when masking objects improves its performance in dynamic scenarios. We do not make any priors on motion: our method learns to mask moving objects by itself. Our method reaches the state of the art on the TUM RGB-D dataset and outperforms it on KITTI and ConsInv datasets.
Score: 5.4310785842119795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual SLAM -- Simultaneous Localization and Mapping -- in dynamic environments typically relies on identifying and masking image features on moving objects to prevent them from negatively affecting performance. Current approaches are suboptimal: they either fail to mask objects when needed or, on the contrary, mask objects needlessly. Thus, we propose a novel SLAM that learns when masking objects improves its performance in dynamic scenarios. Given a method to segment objects and a SLAM, we give the latter the ability of Temporal Masking, i.e., to infer when certain classes of objects should be masked to maximize any given SLAM metric. We do not make any priors on motion: our method learns to mask moving objects by itself. To prevent high annotations costs, we created an automatic annotation method for self-supervised training. We constructed a new dataset, named ConsInv, which includes challenging real-world dynamic sequences respectively indoors and outdoors. Our method reaches the state of the art on the TUM RGB-D dataset and outperforms it on KITTI and ConsInv datasets.

Related papers

Dynamic semantic VSLAM with known and unknown objects [0.5683447641102242]
This paper introduces a novel feature-based Semantic VSLAM capable of detecting dynamic features in the presence of both known and unknown objects. We employ an unsupervised segmentation network, and next utilize an objector detector to identify any of the known classes among those. We then pair this with the computed high-gradient optical-flow information to next identify the static versus dynamic segmentations for both known and unknown object classes.
arXiv Detail & Related papers (2024-12-18T21:55:25Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments [9.706447888754614]
We present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. We also introduce a selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects.
arXiv Detail & Related papers (2024-01-02T12:35:03Z)
MaskFlow: Object-Aware Motion Estimation [0.45646200630189254]
We introduce a novel motion estimation method, MaskFlow, that is capable of estimating accurate motion fields. In addition to lower-level features, that are used in other Deep Neural Network (DNN)-based motion estimation methods, MaskFlow draws from object-level features and segmentations.
arXiv Detail & Related papers (2023-11-21T09:37:49Z)
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [86.47908754383198]
Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories. Our method generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. Our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset.
arXiv Detail & Related papers (2023-03-29T17:58:39Z)
Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition [51.67493993845143]
We reconstruct a neural volume that captures time-varying color, density, scene flow, semantics, and attention information. The semantics and attention let us identify salient foreground objects separately from the background across spacetime. We show that this method can decompose dynamic scenes in an unsupervised way with competitive performance to a supervised method.
arXiv Detail & Related papers (2023-03-02T19:00:05Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
TwistSLAM: Constrained SLAM in Dynamic Environment [0.0]
We present TwistSLAM, a semantic, dynamic, stereo SLAM system that can track dynamic objects in the scene. Our algorithm creates clusters of points according to their semantic class. It uses the static parts of the environment to robustly localize the camera and tracks the remaining objects.
arXiv Detail & Related papers (2022-02-24T22:08:45Z)
Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos. We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks. The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z)
Learning to Segment Dynamic Objects using SLAM Outliers [5.4310785842119795]
We present a method to automatically learn to segment dynamic objects using SLAM outliers. It requires only one monocular sequence per dynamic object for training and consists in localizing dynamic objects using SLAM outliers, creating their masks, and using these masks to train a semantic segmentation network.
arXiv Detail & Related papers (2020-11-12T08:36:54Z)
DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects. To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error. Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.