Related papers: PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening

PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening

URL: http://arxiv.org/abs/2409.06980v1
Date: Wed, 11 Sep 2024 03:13:08 GMT
Title: PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening
Authors: RuoCheng Wu, ZiEn Zhang, ShangQi Deng, YuLe Duan, LiangJian Deng,
Abstract summary: We propose an efficient fine-tuning method, namely PanAdapter, to alleviate the issue of small-scale datasets in pansharpening tasks. We fine-tune the pre-trained CNN model and extract task-specific priors at two scales by proposed Local Prior Extraction (LPE) module. We demonstrate that our approach can benefit from pre-trained image restoration models and achieve state-of-the-art performance in several benchmark pansharpening datasets.
Score: 8.916207546866048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pansharpening is a challenging image fusion task that involves restoring images using two different modalities: low-resolution multispectral images (LRMS) and high-resolution panchromatic (PAN). Many end-to-end specialized models based on deep learning (DL) have been proposed, yet the scale and performance of these models are limited by the size of dataset. Given the superior parameter scales and feature representations of pre-trained models, they exhibit outstanding performance when transferred to downstream tasks with small datasets. Therefore, we propose an efficient fine-tuning method, namely PanAdapter, which utilizes additional advanced semantic information from pre-trained models to alleviate the issue of small-scale datasets in pansharpening tasks. Specifically, targeting the large domain discrepancy between image restoration and pansharpening tasks, the PanAdapter adopts a two-stage training strategy for progressively adapting to the downstream task. In the first stage, we fine-tune the pre-trained CNN model and extract task-specific priors at two scales by proposed Local Prior Extraction (LPE) module. In the second stage, we feed the extracted two-scale priors into two branches of cascaded adapters respectively. At each adapter, we design two parameter-efficient modules for allowing the two branches to interact and be injected into the frozen pre-trained VisionTransformer (ViT) blocks. We demonstrate that by only training the proposed LPE modules and adapters with a small number of parameters, our approach can benefit from pre-trained image restoration models and achieve state-of-the-art performance in several benchmark pansharpening datasets. The code will be available soon.

Related papers

DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models [103.18486625853099]
DEFT, Decompositional Efficient Fine-Tuning, adapts a pre-trained weight matrix by decomposing its update into two components.<n>We conduct experiments on the Dreambooth and Dreambench Plus datasets for personalization, the InsDet dataset for object and scene adaptation, and the VisualCloze dataset for a universal image generation framework.
arXiv Detail & Related papers (2025-09-26T18:01:15Z)
Fine-grained Image Retrieval via Dual-Vision Adaptation [32.27084080471636]
Fine-Grained Image Retrieval (FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features.<n>We propose a Dual-Vision Adaptation (DVA) approach for FGIR, which guides the frozen pre-trained model to perform FGIR through collaborative sample and feature adaptation.
arXiv Detail & Related papers (2025-06-19T12:46:55Z)
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition [33.22316608406554]
We propose a parameter-efficient dual adaptation method for both image and text encoders.<n>Specifically, we design a task-specific adaptation for the image encoder so that the most discriminative information can be well noticed during feature extraction.<n>We develop an innovative fine-grained cross-modal alignment strategy that actively maps visual features to reside in the same temporal stage as semantic descriptions.
arXiv Detail & Related papers (2025-05-09T12:34:10Z)
Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models. ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules. Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z)
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation [31.023236232633213]
Recent adaptations can boost the low-shot capability of Contrastive Vision-Language Pre-training. We propose a Visual Content Refinement (VCR) before the adaptation calculation during the test stage. We apply our method to 3 popular low-shot benchmark tasks with 13 datasets and achieve a significant improvement over state-of-the-art methods.
arXiv Detail & Related papers (2024-07-19T08:34:23Z)
Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models [10.713680139939354]
Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. PETL has garnered attention as a viable alternative to full fine-tuning. We propose a new adapter architecture, $p$-adapter, which employs $p$-Laplacian message passing in Graph Neural Networks (GNNs)
arXiv Detail & Related papers (2023-12-17T05:30:35Z)
PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images [45.507517332100804]
Self-supervised learning (SSL) for RGB images has achieved significant success, yet there is still limited research on SSL for infrared images. Non-iconic infrared images rendering common pre-training tasks less effective. The scarcity of fine-grained textures making it particularly challenging to learn general image features.
arXiv Detail & Related papers (2023-12-13T14:57:28Z)
Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation. Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z)
Cross-Modal Adapter for Vision-Language Retrieval [60.59577149733934]
We present a novel Cross-Modal Adapter for parameter-efficient transfer learning.<n>Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers.<n>Our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed.
arXiv Detail & Related papers (2022-11-17T16:15:30Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis. Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z)
Pro-tuning: Unified Prompt Tuning for Vision Tasks [133.12978197265596]
Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
arXiv Detail & Related papers (2022-07-28T21:09:31Z)
Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT) We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.