Efficient Adaptation For Remote Sensing Visual Grounding
- URL: http://arxiv.org/abs/2503.23083v1
- Date: Sat, 29 Mar 2025 13:49:11 GMT
- Title: Efficient Adaptation For Remote Sensing Visual Grounding
- Authors: Hasan Moughnieh, Mohamad Chalhoub, Hasan Nasrallah, Cristiano Nattero, Paolo Campanella, Ali J. Ghandour,
- Abstract summary: Foundation models can associate textual descriptions with object positions through the Visual Grounding (VG) task.<n>Due to domain-specific challenges, their direct application to remote sensing (RS) produces sub-optimal results.<n>This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models have revolutionized artificial intelligence (AI), offering remarkable capabilities across multi-modal domains. Their ability to precisely locate objects in complex aerial and satellite images, using rich contextual information and detailed object descriptions, is essential for remote sensing (RS). These models can associate textual descriptions with object positions through the Visual Grounding (VG) task, but due to domain-specific challenges, their direct application to RS produces sub-optimal results. To address this, we applied Parameter Efficient Fine Tuning (PEFT) techniques to adapt these models for RS-specific VG tasks. Specifically, we evaluated LoRA placement across different modules in Grounding DINO and used BitFit and adapters to fine-tune the OFA foundation model pre-trained on general-purpose VG datasets. This approach achieved performance comparable to or surpassing current State Of The Art (SOTA) models while significantly reducing computational costs. This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS, offering a practical and cost-effective alternative to full model training.
Related papers
- Segmentation of arbitrary features in very high resolution remote sensing imagery [0.0]
We introduce EcoMapper, a scalable solution to segment arbitrary features in VHR RS imagery.<n>Models trained with EcoMapper successfully segmented two distinct features in a real-world UAV dataset.<n>A comprehensive methodology for field surveys was developed to ensure DL methods can be applied effectively to collected data.
arXiv Detail & Related papers (2024-12-20T16:48:52Z) - RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering [23.699493284403967]
This paper proposes RS-MoE, a first Mixture of Expert based VLM specifically customized for remote sensing domain.<n>Unlike traditional MoE models, the core of RS-MoE is the MoE Block, which incorporates a novel Instruction Router and multiple lightweight Large Language Models (LLMs) as expert models.<n>We show that our model achieves state-of-the-art performance in generating precise and contextually relevant captions.
arXiv Detail & Related papers (2024-11-03T15:05:49Z) - Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning [65.31677646659895]
This paper focuses on the concept of task-specific directions (TSDs)-critical for transitioning large models from pretrained states to task-specific enhancements in PEFT.<n>We introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks.
arXiv Detail & Related papers (2024-09-02T08:10:51Z) - Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control.
The mapping between context and AI model parameters is ideally done in a zero-shot fashion.
This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Parameter-Efficient Transfer Learning for Remote Sensing Image-Text
Retrieval [10.84733740863356]
In this work, we investigate the parameter-efficient transfer learning (PETL) method to transfer visual-language knowledge from the natural domain to the RS domain on the image-text retrieval task.
Our proposed model only contains 0.16M training parameters, which can achieve a parameter reduction of 98.9% compared to full fine-tuning.
Our retrieval performance exceeds traditional methods by 7-13% and achieves comparable or better performance than full fine-tuning.
arXiv Detail & Related papers (2023-08-24T02:43:53Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data.
MVPA works best with a well-designed feature set and an adequate sample size.
Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes.
This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z) - ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework
for LiDAR Point Cloud Segmentation [111.56730703473411]
Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations.
Simulation-to-real domain adaptation (SRDA) trains a DNN using unlimited synthetic data with automatically generated labels.
ePointDA consists of three modules: self-supervised dropout noise rendering, statistics-invariant and spatially-adaptive feature alignment, and transferable segmentation learning.
arXiv Detail & Related papers (2020-09-07T23:46:08Z) - Data Techniques For Online End-to-end Speech Recognition [17.621967685914587]
Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data.
While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue.
We explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain.
arXiv Detail & Related papers (2020-01-24T22:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.