GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model
- URL: http://arxiv.org/abs/2411.19289v1
- Date: Thu, 28 Nov 2024 17:41:33 GMT
- Title: GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model
- Authors: Rui Zhou, Jingbin Liu, Junbin Xie, Jianyu Zhang, Yingze Hu, Jiele Zhao,
- Abstract summary: We introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into visual-inertial odometry (VIO)
The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects.
Our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods.
- Score: 7.07379964916809
- License:
- Abstract: Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments [28.23284296418962]
Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments.
Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts.
We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE)
DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
arXiv Detail & Related papers (2024-02-29T10:03:57Z) - Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization [61.64304227831361]
Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains.
We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
arXiv Detail & Related papers (2024-02-28T16:16:51Z) - Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes [11.56518009058007]
Clustering in dynamic environments is of increasing importance, with broad applications ranging from real-time data analysis and online unsupervised learning to dynamic facility location problems.
meta-heuristics have shown promising effectiveness in static clustering tasks, their application for tracking optimal clustering solutions or robust clustering over time in dynamic environments remains largely underexplored.
This is partly due to a lack of dynamic datasets with diverse, controllable, and realistic dynamic characteristics, hindering systematic performance evaluations of clustering algorithms in various dynamic scenarios.
This deficiency leads to a gap in our understanding and capability to effectively design algorithms for clustering in dynamic environments.
arXiv Detail & Related papers (2024-02-24T05:49:27Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Learning Robust Dynamics through Variational Sparse Gating [18.476155786474358]
In environments with many objects, often only a small number of them are moving or interacting at the same time.
In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels.
arXiv Detail & Related papers (2022-10-21T02:56:51Z) - TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and
Reconstruction [57.1209039399599]
We propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein.
In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity.
We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation.
arXiv Detail & Related papers (2021-05-16T16:15:05Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - RP-VIO: Robust Plane-based Visual-Inertial Odometry for Dynamic
Environments [14.260575326111585]
We present RP-VIO, a state-of-the-art visual-inertial odometry system for dynamic environments.
We also present a highly-dynamic, photorealistic synthetic dataset for a more effective evaluation of the capabilities of modern VINS systems.
arXiv Detail & Related papers (2021-03-18T17:33:07Z) - DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects.
To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error.
Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z) - Multi-object Monocular SLAM for Dynamic Environments [12.537311048732017]
The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene.
Existing approaches solve restricted variants of the problem, but the solutions suffer relative scale ambiguity.
We propose a multi pose-graph optimization formulation, to resolve the relative and absolute scale factor ambiguities involved.
arXiv Detail & Related papers (2020-02-10T03:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.