Related papers: Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions

Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions

URL: http://arxiv.org/abs/2504.05795v2
Date: Wed, 09 Apr 2025 10:05:59 GMT
Title: Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions
Authors: Hao Zhang, Yanping Zha, Qingwei Zhuang, Zhenfeng Shao, Jiayi Ma,
Abstract summary: Current image fusion methods struggle to adapt to real-world environments encompassing diverse degradations with spatially varying characteristics.<n>We propose a robust fusion controller capable of achieving degradation-aware image fusion through fine-grained language instructions.<n>Our RFC is robust against various composite degradations, particularly in highly challenging flare scenarios.
Score: 26.269399073437903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current image fusion methods struggle to adapt to real-world environments encompassing diverse degradations with spatially varying characteristics. To address this challenge, we propose a robust fusion controller (RFC) capable of achieving degradation-aware image fusion through fine-grained language instructions, ensuring its reliable application in adverse environments. Specifically, RFC first parses language instructions to innovatively derive the functional condition and the spatial condition, where the former specifies the degradation type to remove, while the latter defines its spatial coverage. Then, a composite control priori is generated through a multi-condition coupling network, achieving a seamless transition from abstract language instructions to latent control variables. Subsequently, we design a hybrid attention-based fusion network to aggregate multi-modal information, in which the obtained composite control priori is deeply embedded to linearly modulate the intermediate fused features. To ensure the alignment between language instructions and control outcomes, we introduce a novel language-feature alignment loss, which constrains the consistency between feature-level gains and the composite control priori. Extensive experiments on publicly available datasets demonstrate that our RFC is robust against various composite degradations, particularly in highly challenging flare scenarios.

Related papers

$φ^{\infty}$: Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models [0.0]
We identify a critical vulnerability in autoregressive transformer language models where the em dash token induces semantic drift.<n>We propose a novel solution combining symbolic clause purification via the phi-infinity operator with targeted embedding matrix.
arXiv Detail & Related papers (2025-06-22T18:27:39Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts [58.99648692413168]
Current image fusion methods struggle to address the composite degradations encountered in real-world imaging scenarios.<n>We propose ControlFusion, which adaptively neutralizes composite degradations.<n>In experiments, ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling.
arXiv Detail & Related papers (2025-03-30T08:18:53Z)
Constrained Language Generation with Discrete Diffusion Models [61.81569616239755]
We present Constrained Discrete Diffusion (CDD), a novel method for enforcing constraints on natural language by integrating discrete diffusion models with differentiable optimization.<n>We show how this technique can be applied to satisfy a variety of natural language constraints, including (i) toxicity mitigation by preventing harmful content from emerging, (ii) character and sequence level lexical constraints, and (iii) novel molecule sequence generation with specific property adherence.
arXiv Detail & Related papers (2025-03-12T19:48:12Z)
PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation [24.964136963713102]
We present PixelPonder, a novel unified control framework that allows for effective control of multiple visual conditions under a single control structure.<n>Specifically, we design a patch-level adaptive condition selection mechanism that dynamically prioritizes spatially relevant control signals at the sub-region level.<n>Extensive experiments demonstrate that PixelPonder surpasses previous methods across different benchmark datasets.
arXiv Detail & Related papers (2025-03-09T16:27:02Z)
PICASO: Permutation-Invariant Context Composition with State Space Models [98.91198288025117]
State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states.<n>We propose a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens.<n>We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
arXiv Detail & Related papers (2025-02-24T19:48:00Z)
Reconciling Semantic Controllability and Diversity for Remote Sensing Image Synthesis with Hybrid Semantic Embedding [12.330893658398042]
We present a Hybrid Semantic Embedding Guided Geneversarative Adversarial Network (HySEGGAN) for controllable and efficient remote sensing image synthesis. Motivated by feature description, we propose a hybrid semantic Embedding method, that coordinates fine-grained local semantic layouts. A Semantic Refinement Network (SRN) is introduced, incorporating a novel loss function to ensure fine-grained semantic feedback.
arXiv Detail & Related papers (2024-11-22T07:51:36Z)
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition [17.412985505938508]
Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. This paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability.
arXiv Detail & Related papers (2024-05-15T06:41:43Z)
Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion. It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing. Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z)
Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks [10.880057430629126]
Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. We introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties.
arXiv Detail & Related papers (2023-05-02T18:27:13Z)
An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation [70.77243918587321]
Multi-aspect controllable text generation that controls generated text in multiple aspects has attracted increasing attention. We provide a theoretical lower bound for the interference and empirically found that the interference grows with the number of layers where prefixes are inserted. We propose using trainable gates to normalize the intervention of prefixes to restrain the growing interference.
arXiv Detail & Related papers (2022-12-19T11:53:59Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.