LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
- URL: http://arxiv.org/abs/2509.00346v1
- Date: Sat, 30 Aug 2025 03:59:26 GMT
- Title: LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
- Authors: Xunpeng Yi, Yibing Zhang, Xinyu Xiang, Qinglong Yan, Han Xu, Jiayi Ma,
- Abstract summary: Current advanced research on infrared and visible image fusion primarily focuses on improving fusion performance.<n>We propose a novel approach that towards extremely fast fusion via distillation to learnable lookup tables specifically designed for image fusion, termed as LUT-Fuse.
- Score: 33.062146767758705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current advanced research on infrared and visible image fusion primarily focuses on improving fusion performance, often neglecting the applicability on real-time fusion devices. In this paper, we propose a novel approach that towards extremely fast fusion via distillation to learnable lookup tables specifically designed for image fusion, termed as LUT-Fuse. Firstly, we develop a look-up table structure that utilizing low-order approximation encoding and high-level joint contextual scene encoding, which is well-suited for multi-modal fusion. Moreover, given the lack of ground truth in multi-modal image fusion, we naturally proposed the efficient LUT distillation strategy instead of traditional quantization LUT methods. By integrating the performance of the multi-modal fusion network (MM-Net) into the MM-LUT model, our method achieves significant breakthroughs in efficiency and performance. It typically requires less than one-tenth of the time compared to the current lightweight SOTA fusion algorithms, ensuring high operational speed across various scenarios, even in low-power mobile devices. Extensive experiments validate the superiority, reliability, and stability of our fusion approach. The code is available at https://github.com/zyb5/LUT-Fuse.
Related papers
- Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z) - Efficient Rectified Flow for Image Fusion [48.330480065862474]
We propose RFfusion, an efficient one-step diffusion model for image fusion based on Rectified Flow.<n>We also propose a task-specific variational autoencoder architecture tailored for image fusion.<n>Our method outperforms other state-of-the-art methods in terms of both inference speed and fusion quality.
arXiv Detail & Related papers (2025-09-20T06:21:00Z) - Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion [15.666336202108862]
Multimodal Image Fusion (MMIF) aims to integrate complementary information from different imaging modalities to overcome the limitations of individual sensors.<n>Current MMIF methods face challenges such as modality misalignment, high-frequency detail destruction, and task-specific limitations.<n>We propose AdaSFFuse, a novel framework for task-generalized MMIF through adaptive cross-domain co-fusion learning.
arXiv Detail & Related papers (2025-08-21T12:31:14Z) - An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models [18.184158874126545]
We investigate how different fusion strategies can affect vision-language alignment.
A specially designed intermediate fusion can boost text-to-image alignment with improved generation quality.
Our model achieves a higher CLIP Score and lower FID, with 20% reduced FLOPs, and 50% increased training speed.
arXiv Detail & Related papers (2024-03-25T08:16:06Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - LRRNet: A Novel Representation Learning Guided Fusion Network for
Infrared and Visible Images [98.36300655482196]
We formulate the fusion task mathematically, and establish a connection between its optimal solution and the network architecture that can implement it.
In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model.
Based on this novel network architecture, an end-to-end lightweight fusion network is constructed to fuse infrared and visible light images.
arXiv Detail & Related papers (2023-04-11T12:11:23Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image
Fusion [16.64908104831795]
We present a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks.
Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images.
arXiv Detail & Related papers (2022-09-22T19:06:55Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.