Related papers: Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

URL: http://arxiv.org/abs/2404.18253v5
Date: Tue, 28 May 2024 16:24:03 GMT
Title: Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Authors: Tengjun Huang,
Abstract summary: "Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment. HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster together impedes efficient transfer learning. To tackle this issue, we review the aim of multimodal transfer learning for downstream tasks from a unified perspective, and rethink the optimization process based on three distinct objectives. We propose "Harmonized Transfer Learning and Modality Alignment (HarMA)", a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment, while minimizing training overhead through parameter-efficient fine-tuning. Remarkably, without the need for external data for training, HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing. Our experiments reveal that HarMA achieves competitive and even superior performance to fully fine-tuned models with only minimal adjustable parameters. Due to its simplicity, HarMA can be integrated into almost all existing multimodal pretraining models. We hope this method can facilitate the efficient application of large models to a wide range of downstream tasks while significantly reducing the resource consumption. Code is available at https://github.com/seekerhuang/HarMA.

Related papers

MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving [12.330414519761524]
Trajectory planning is a core task in autonomous driving.<n>MLLMs with Reinforcement Learning have shown promise in addressing "long-tail" scenarios.<n>We present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback.
arXiv Detail & Related papers (2026-01-30T12:47:55Z)
Improving Multimodal Learning Balance and Sufficiency through Data Remixing [14.282792733217653]
Methods for enforcing the weak modality fail to achieve unimodal sufficiency and multimodal balance.<n>We propose multimodal Data Remixing, including decoupling multimodal data and filtering hard samples for each modality to mitigate modality imbalance.<n>Our method can be seamlessly integrated with existing approaches, improving accuracy by approximately 6.50%$uparrow$ on CREMAD and 3.41%$uparrow$ on Kinetic-Sounds.
arXiv Detail & Related papers (2025-06-13T08:01:29Z)
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
Robust Multimodal Learning via Cross-Modal Proxy Tokens [11.704477276235847]
Multimodal models often experience a significant performance drop when one or more modalities are missing during inference. We propose a simple yet effective approach that enhances robustness to missing modalities while maintaining strong performance when all modalities are available. Our method introduces cross-modal proxy tokens (CMPTs), which approximate the class token of a missing modality by attending only to the tokens of the available modality.
arXiv Detail & Related papers (2025-01-29T18:15:49Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model [5.643658120200373]
AdapMTL is an adaptive pruning framework for multitask models. It balances sparsity allocation and accuracy performance across multiple tasks. It showcases superior performance compared to state-of-the-art pruning methods.
arXiv Detail & Related papers (2024-08-07T17:19:15Z)
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning [28.12788291168137]
We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
arXiv Detail & Related papers (2023-11-04T02:22:40Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Efficient Multimodal Fusion via Interactive Prompting [62.08292938484994]
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. We propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers.
arXiv Detail & Related papers (2023-04-13T07:31:51Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies [0.981328290471248]
Multi-Task Learning (MTL) is a growing subject of interest in deep learning. MTL can be impractical as certain tasks can dominate training and hurt performance in others. We propose techniques for loss balancing based on scaling by the exponential moving average.
arXiv Detail & Related papers (2022-11-22T09:22:48Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.