MambaLoc: Efficient Camera Localisation via State Space Model
- URL: http://arxiv.org/abs/2408.09680v2
- Date: Tue, 20 Aug 2024 08:44:42 GMT
- Title: MambaLoc: Efficient Camera Localisation via State Space Model
- Authors: Jialu Wang, Kaichen Zhou, Andrew Markham, Niki Trigoni,
- Abstract summary: Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality.
achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data.
We have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc.
- Score: 42.85368902409545
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality. However, achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data. To tackle these issues, we have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc. The proposed model demonstrates exceptional training efficiency by capitalizing on the SSM model's strengths in efficient feature extraction, rapid computation, and memory optimization, and it further ensures robustness in sparse data environments due to its parameter sparsity. Additionally, we propose the Global Information Selector (GIS), which leverages selective SSM to implicitly achieve the efficient global feature extraction capabilities of Non-local Neural Networks. This design leverages the computational efficiency of the SSM model alongside the Non-local Neural Networks' capacity to capture long-range dependencies with minimal layers. Consequently, the GIS enables effective global information capture while significantly accelerating convergence. Our extensive experimental validation using public indoor and outdoor datasets first demonstrates our model's effectiveness, followed by evidence of its versatility with various existing localization models. Our code and models are publicly available to support further research and development in this area.
Related papers
- Meta-Learning for Physically-Constrained Neural System Identification [9.417562391585076]
We present a gradient-based meta-learning framework for rapid adaptation of neural state-space models (NSSMs) for black-box system identification.
We show that the meta-learned models result in improved downstream performance in model-based state estimation in indoor localization and energy systems.
arXiv Detail & Related papers (2025-01-10T18:46:28Z) - Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning [7.06787067270941]
We propose a novel framework that significantly reduces data volume while enhancing classification accuracy.
Our model employs a bidirectional reversed convolutional neural network (CNN) to efficiently extract spectral features, complemented by a specialized block for spatial feature analysis.
arXiv Detail & Related papers (2024-11-29T23:32:26Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations [1.5723316845301678]
This report introduces a novel methodology for training with augmentations to enhance model robustness and performance in such conditions.
We present a comprehensive framework that includes identifying weak spots in Machine Learning models, selecting suitable augmentations, and devising effective training strategies.
Experimental results demonstrate improvements in model performance, as measured by commonly used metrics such as mean Average Precision (mAP) and mean Intersection over Union (mIoU) on open-source object detection and semantic segmentation models and datasets.
arXiv Detail & Related papers (2024-08-30T14:15:48Z) - Interpreting and Improving Attention From the Perspective of Large Kernel Convolution [51.06461246235176]
We introduce Large Kernel Convolutional Attention (LKCA), a novel formulation that reinterprets attention operations as a single large- Kernel convolution.
LKCA achieves competitive performance across various visual tasks, particularly in data-constrained settings.
arXiv Detail & Related papers (2024-01-11T08:40:35Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - A Generative Self-Supervised Framework using Functional Connectivity in
fMRI Data [15.211387244155725]
Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity.
Recent research on the application of Graph Neural Network (GNN) to FC suggests that exploiting the time-varying properties of the FC could significantly improve the accuracy and interpretability of the model prediction.
High cost of acquiring high-quality fMRI data and corresponding labels poses a hurdle to their application in real-world settings.
We propose a generative SSL approach that is tailored to effectively harnesstemporal information within dynamic FC.
arXiv Detail & Related papers (2023-12-04T16:14:43Z) - Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.