Related papers: MambaLoc: Efficient Camera Localisation via State Space Model

MambaLoc: Efficient Camera Localisation via State Space Model

URL: http://arxiv.org/abs/2408.09680v2
Date: Tue, 20 Aug 2024 08:44:42 GMT
Title: MambaLoc: Efficient Camera Localisation via State Space Model
Authors: Jialu Wang, Kaichen Zhou, Andrew Markham, Niki Trigoni,
Abstract summary: Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality. achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data. We have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc.
Score: 42.85368902409545
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality. However, achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data. To tackle these issues, we have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc. The proposed model demonstrates exceptional training efficiency by capitalizing on the SSM model's strengths in efficient feature extraction, rapid computation, and memory optimization, and it further ensures robustness in sparse data environments due to its parameter sparsity. Additionally, we propose the Global Information Selector (GIS), which leverages selective SSM to implicitly achieve the efficient global feature extraction capabilities of Non-local Neural Networks. This design leverages the computational efficiency of the SSM model alongside the Non-local Neural Networks' capacity to capture long-range dependencies with minimal layers. Consequently, the GIS enables effective global information capture while significantly accelerating convergence. Our extensive experimental validation using public indoor and outdoor datasets first demonstrates our model's effectiveness, followed by evidence of its versatility with various existing localization models. Our code and models are publicly available to support further research and development in this area.

Related papers

Federated Dynamic Modeling and Learning for Spatiotemporal Data Forecasting [0.8568432695376288]
This paper presents an advanced Federated Learning (FL) framework for forecasting complextemporal data, improving upon recent state-of-the-art models. The resulting architecture significantly improves the model's capacity to handle complex temporal patterns in diverse forecasting applications. The efficiency of our approach is demonstrated through extensive experiments on real-world applications, including public datasets for multimodal transport demand forecasting and private datasets for Origin-Destination (OD) matrix forecasting in urban areas.
arXiv Detail & Related papers (2025-03-06T15:16:57Z)
Meta-Learning for Physically-Constrained Neural System Identification [9.417562391585076]
We present a gradient-based meta-learning framework for rapid adaptation of neural state-space models (NSSMs) for black-box system identification. We show that the meta-learned models result in improved downstream performance in model-based state estimation in indoor localization and energy systems.
arXiv Detail & Related papers (2025-01-10T18:46:28Z)
Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning [7.06787067270941]
We propose a novel framework that significantly reduces data volume while enhancing classification accuracy. Our model employs a bidirectional reversed convolutional neural network (CNN) to efficiently extract spectral features, complemented by a specialized block for spatial feature analysis.
arXiv Detail & Related papers (2024-11-29T23:32:26Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios. We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z)
Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations [1.5723316845301678]
This report introduces a novel methodology for training with augmentations to enhance model robustness and performance in such conditions. We present a comprehensive framework that includes identifying weak spots in Machine Learning models, selecting suitable augmentations, and devising effective training strategies. Experimental results demonstrate improvements in model performance, as measured by commonly used metrics such as mean Average Precision (mAP) and mean Intersection over Union (mIoU) on open-source object detection and semantic segmentation models and datasets.
arXiv Detail & Related papers (2024-08-30T14:15:48Z)
Self-Supervised Learning for User Localization [8.529237718266042]
Machine learning techniques have shown remarkable accuracy in localization tasks. Their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck. We propose a pioneering approach that leverages self-supervised pretraining on unlabeled data to boost the performance of supervised learning for user localization based on CSI.
arXiv Detail & Related papers (2024-04-19T21:49:10Z)
Interpreting and Improving Attention From the Perspective of Large Kernel Convolution [51.06461246235176]
We introduce Large Kernel Convolutional Attention (LKCA), a novel formulation that reinterprets attention operations as a single large- Kernel convolution. LKCA achieves competitive performance across various visual tasks, particularly in data-constrained settings.
arXiv Detail & Related papers (2024-01-11T08:40:35Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
A Generative Self-Supervised Framework using Functional Connectivity in fMRI Data [15.211387244155725]
Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity. Recent research on the application of Graph Neural Network (GNN) to FC suggests that exploiting the time-varying properties of the FC could significantly improve the accuracy and interpretability of the model prediction. High cost of acquiring high-quality fMRI data and corresponding labels poses a hurdle to their application in real-world settings. We propose a generative SSL approach that is tailored to effectively harnesstemporal information within dynamic FC.
arXiv Detail & Related papers (2023-12-04T16:14:43Z)
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z)
Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model. Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint. We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG) Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.