FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning
- URL: http://arxiv.org/abs/2508.20817v2
- Date: Sun, 31 Aug 2025 17:52:52 GMT
- Title: FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning
- Authors: He Li, Xinyu Liu, Weihang Kong, Xingchen Zhang,
- Abstract summary: Visible and infrared image fusion (VIF) is an important multimedia task in computer vision.<n>Recent studies have begun incorporating downstream tasks, such as semantic segmentation and object detection, to provide semantic guidance for VIF.<n>We propose FusionCounting, a novel multi-task learning framework that integrates crowd counting into the VIF process.
- Score: 16.955260249719533
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Visible and infrared image fusion (VIF) is an important multimedia task in computer vision. Most VIF methods focus primarily on optimizing fused image quality. Recent studies have begun incorporating downstream tasks, such as semantic segmentation and object detection, to provide semantic guidance for VIF. However, semantic segmentation requires extensive annotations, while object detection, despite reducing annotation efforts compared with segmentation, faces challenges in highly crowded scenes due to overlapping bounding boxes and occlusion. Moreover, although RGB-T crowd counting has gained increasing attention in recent years, no studies have integrated VIF and crowd counting into a unified framework. To address these challenges, we propose FusionCounting, a novel multi-task learning framework that integrates crowd counting into the VIF process. Crowd counting provides a direct quantitative measure of population density with minimal annotation, making it particularly suitable for dense scenes. Our framework leverages both input images and population density information in a mutually beneficial multi-task design. To accelerate convergence and balance tasks contributions, we introduce a dynamic loss function weighting strategy. Furthermore, we incorporate adversarial training to enhance the robustness of both VIF and crowd counting, improving the model's stability and resilience to adversarial attacks. Experimental results on public datasets demonstrate that FusionCounting not only enhances image fusion quality but also achieves superior crowd counting performance.
Related papers
- SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion [8.61849023109742]
We propose a self-supervised training framework for segmentation-oriented VIF methods (SSVIF)<n>We introduce a novel self-supervised task-cross-segmentation consistency that enables the fusion model to learn high-level semantic features without the supervision of segmentation labels.<n>Our proposed SSVIF outperforms traditional VIF methods and rivals supervised segmentation-oriented ones.
arXiv Detail & Related papers (2025-09-26T15:05:33Z) - MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning [17.67073665165365]
We propose a concise and universal training framework, MultiTaskVIF, for segmentation-oriented VIF models.<n>In this framework, we introduce a multi-task head decoder (MTH) to simultaneously output both the fused image and the segmentation result during training.
arXiv Detail & Related papers (2025-05-10T14:47:19Z) - OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning [19.22887628187884]
A novel LVM-guided fusion framework with Object-aware and Contextual COntrastive learning is proposed.<n>A novel feature interaction fusion network is also designed to resolve information conflicts in fusion images caused by modality differences.<n>The effectiveness of the proposed method is validated, and exceptional performance is also demonstrated on downstream visual task.
arXiv Detail & Related papers (2025-03-24T12:57:23Z) - Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.<n> Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.<n>Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z) - Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption [65.06388526722186]
Infrared-visible image fusion is a critical task in computer vision.<n>There is a lack of recent comprehensive surveys that address this rapidly expanding domain.<n>We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
arXiv Detail & Related papers (2025-01-18T13:17:34Z) - Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and
Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules.
We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion [68.78897015832113]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.<n>Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.