Related papers: From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

URL: http://arxiv.org/abs/2406.16850v1
Date: Mon, 24 Jun 2024 17:57:05 GMT
Title: From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
Authors: Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang,
Abstract summary: Embodied agents require robust navigation systems to operate in unstructured environments. We propose a novel, customizable pipeline for noisy data synthesis. Our analysis uncovers the susceptibilities of both neural (NeRF) and non-neural SLAM models to disturbances.
Score: 32.52171076424419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Mapping (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world with diverse perturbations remains under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. The pipeline comprises a comprehensive taxonomy of sensor and motion perturbations for embodied multi-modal (specifically RGB-D) sensing, categorized by their sources and propagation order, allowing for procedural composition. We also provide a toolbox for synthesizing these perturbations, enabling the transformation of clean environments into challenging noisy simulations. Utilizing the pipeline, we instantiate the large-scale Noisy-Replica benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced RGB-D SLAM models. Our extensive analysis uncovers the susceptibilities of both neural (NeRF and Gaussian Splatting -based) and non-neural SLAM models to disturbances, despite their demonstrated accuracy in standard benchmarks. Our code is publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation.

Related papers

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
Open-set Anomaly Segmentation in Complex Scenarios [88.11076112792992]
This paper introduces ComsAmy, a benchmark for open-set anomaly segmentation in complex scenarios. ComsAmy encompasses a wide spectrum of adverse weather conditions, dynamic driving environments, and diverse anomaly types. We propose a novel energy-entropy learning (EEL) strategy that integrates the complementary information from energy and entropy.
arXiv Detail & Related papers (2025-04-28T12:00:10Z)
Whenever, Wherever: Towards Orchestrating Crowd Simulations with Spatio-Temporal Spawn Dynamics [65.72663487116439]
We propose nTPP-GMM that models spawn-temporal spawn dynamics using Neural Temporal Point Processes. We evaluate our approach by simulations of three diverse real-world datasets with nTPP-GMM.
arXiv Detail & Related papers (2025-03-20T18:46:41Z)
Training-free Quantum-Inspired Image Edge Extraction Method [4.8188571652305185]
We propose a training-free, quantum-inspired edge detection model. Our approach integrates classical Sobel edge detection, the Schr"odinger wave equation refinement, and a hybrid framework. By eliminating the need for training, the model is lightweight and adaptable to diverse applications.
arXiv Detail & Related papers (2025-01-31T07:24:38Z)
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video [30.89206445146674]
We aim to redefine robust ego-motion estimation and photorealistic 3D reconstruction by addressing a critical limitation: reliance on noise-free data. We tackle three core challenges: scalable data generation, comprehensive robustness, and model enhancement. We create Robust-Ego3D, a benchmark rigorously designed to expose noise-induced performance degradation.
arXiv Detail & Related papers (2025-01-24T08:25:48Z)
Sparse identification of nonlinear dynamics and Koopman operators with Shallow Recurrent Decoder Networks [3.1484174280822845]
We present a method to jointly solve the sensing and model identification problems with simple implementation, efficient, and robust performance. SINDy-SHRED uses Gated Recurrent Units to model sparse sensor measurements along with a shallow network decoder to reconstruct the full-temporal field from the latent state space. We conduct systematic experimental studies on PDE data such as turbulent flows, real-world sensor measurements for sea surface temperature, and direct video data.
arXiv Detail & Related papers (2025-01-23T02:18:13Z)
Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities. We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration [43.811367860375825]
Recent work has demonstrated that model misspecification can harm simulation-based inference's reliability. This work introduces robust posterior estimation (ROPE), a framework that overcomes model misspecification with a small real-world calibration set of ground truth parameter measurements.
arXiv Detail & Related papers (2024-05-14T16:04:39Z)
Koopman-Based Surrogate Modelling of Turbulent Rayleigh-Bénard Convection [4.248022697109535]
We use a Koopman-inspired architecture called the Linear Recurrent Autoencoder Network (LRAN) for learning reduced-order dynamics in convection flows. A traditional fluid dynamics method, the Kernel Dynamic Mode Decomposition (KDMD) is used to compare the LRAN. We obtained more accurate predictions with the LRAN than with KDMD in the most turbulent setting.
arXiv Detail & Related papers (2024-05-10T12:15:02Z)
Customizable Perturbation Synthesis for Robust SLAM Benchmarking [33.74471840597803]
We propose a novel, customizable pipeline for noisy data synthesis. This pipeline incorporates customizable hardware setups, software components, and perturbed environments. We instantiate the Robust-SLAM benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced SLAM models.
arXiv Detail & Related papers (2024-02-12T23:49:40Z)
DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking. Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture. We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z)
A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy [2.5109359014278954]
We propose a novel generative model within the Bayesian non-parametric learning (BNPL) framework to address some notable failure modes in generative adversarial networks (GANs) and variational autoencoders (VAEs)<n>We will demonstrate that the BNPL framework enhances training stability and provides robustness and accuracy guarantees when incorporating the Wasserstein distance and maximum mean discrepancy measure (WMMD) into our model's loss function.
arXiv Detail & Related papers (2023-08-27T08:58:31Z)
Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance. We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD) RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z)
HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments [0.0]
This study uses domain randomization to generate a synthetic RGB-D dataset for training multimodal instance segmentation models. We show that our approach enables the models to outperform corresponding models trained on existing state-of-the-art datasets.
arXiv Detail & Related papers (2023-04-12T13:02:08Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting. HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information. Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.