Related papers: Task-Customized Mixture of Adapters for General Image Fusion

Task-Customized Mixture of Adapters for General Image Fusion

URL: http://arxiv.org/abs/2403.12494v2
Date: Sun, 24 Mar 2024 03:17:24 GMT
Title: Task-Customized Mixture of Adapters for General Image Fusion
Authors: Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu,
Abstract summary: General image fusion aims at integrating important information from multi-source images. We propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model.
Score: 51.8742437521891
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .

Related papers

Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion [82.74585945197231]
Unified image fusion aims to integrate complementary information from multi-source images, enhancing image quality. Existing general image fusion methods incorporate explicit task identification to enable adaptation to different fusion tasks. We propose a novel unified image fusion framework named "TITA", which balances Task-invariant Interaction and Task-specific Adaptation.
arXiv Detail & Related papers (2025-04-07T15:08:35Z)
UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants [12.756326600787629]
We introduce a Unified Feature Matching pre-trained model (UFM) designed to address feature matching challenges across a wide spectrum of modal images. We present Multimodal Image Assistant (MIA) transformers, finely tunable structures adept at handling diverse feature matching problems.
arXiv Detail & Related papers (2025-03-26T06:20:52Z)
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion [38.16599550115468]
We propose to leverage low-level vision tasks from digital photography fusion, allowing for effective feature interaction through pixel-level supervision. The proposed GIFNet supports diverse fusion tasks, achieving high performance across both seen and unseen scenarios with a single model.
arXiv Detail & Related papers (2025-02-27T07:55:19Z)
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images. DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion [59.19469551774703]
Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks. We propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts. Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate.
arXiv Detail & Related papers (2023-02-02T20:06:58Z)
TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning [5.849513679510834]
Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets. We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
arXiv Detail & Related papers (2022-01-19T07:30:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.