Related papers: SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion

SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion

URL: http://arxiv.org/abs/2509.22450v1
Date: Fri, 26 Sep 2025 15:05:33 GMT
Title: SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion
Authors: Zixian Zhao, Xingchen Zhang,
Abstract summary: We propose a self-supervised training framework for segmentation-oriented VIF methods (SSVIF)<n>We introduce a novel self-supervised task-cross-segmentation consistency that enables the fusion model to learn high-level semantic features without the supervision of segmentation labels.<n>Our proposed SSVIF outperforms traditional VIF methods and rivals supervised segmentation-oriented ones.
Score: 8.61849023109742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visible and infrared image fusion (VIF) has gained significant attention in recent years due to its wide application in tasks such as scene segmentation and object detection. VIF methods can be broadly classified into traditional VIF methods and application-oriented VIF methods. Traditional methods focus solely on improving the quality of fused images, while application-oriented VIF methods additionally consider the performance of downstream tasks on fused images by introducing task-specific loss terms during training. However, compared to traditional methods, application-oriented VIF methods require datasets labeled for downstream tasks (e.g., semantic segmentation or object detection), making data acquisition labor-intensive and time-consuming. To address this issue, we propose a self-supervised training framework for segmentation-oriented VIF methods (SSVIF). Leveraging the consistency between feature-level fusion-based segmentation and pixel-level fusion-based segmentation, we introduce a novel self-supervised task-cross-segmentation consistency-that enables the fusion model to learn high-level semantic features without the supervision of segmentation labels. Additionally, we design a two-stage training strategy and a dynamic weight adjustment method for effective joint learning within our self-supervised framework. Extensive experiments on public datasets demonstrate the effectiveness of our proposed SSVIF. Remarkably, although trained only on unlabeled visible-infrared image pairs, our SSVIF outperforms traditional VIF methods and rivals supervised segmentation-oriented ones. Our code will be released upon acceptance.

Related papers

Scaling Dense Event-Stream Pretraining from Visual Foundation Models [112.44243079477137]
We launch a novel self-supervised pretraining method that distills visual foundation models (VFMs) to push the boundaries of event representation at scale.<n>We curate an extensive synchronized image-event collection to amplify cross-modal alignment.<n>We extend the alignment objective to semantic structures provided off-the-shelf by VFMs, indicating a broader receptive field and stronger supervision.
arXiv Detail & Related papers (2026-03-04T12:06:09Z)
FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning [16.955260249719533]
Visible and infrared image fusion (VIF) is an important multimedia task in computer vision.<n>Recent studies have begun incorporating downstream tasks, such as semantic segmentation and object detection, to provide semantic guidance for VIF.<n>We propose FusionCounting, a novel multi-task learning framework that integrates crowd counting into the VIF process.
arXiv Detail & Related papers (2025-08-28T14:15:18Z)
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion [65.80051636480836]
This paper proposes a conditional diffusion model guided by the Segment Anything Model (SAM) to achieve high-fidelity and semantically-aware image fusion.<n>The framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks as a condition to drive the diffusion model's coarse-to-fine denoising generation.<n>Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations.
arXiv Detail & Related papers (2025-08-07T10:58:52Z)
AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation [8.563354084119062]
Video Frame Interpolation (VFI) is a fundamental Low-Level Vision (LLV) task that synthesizes intermediate frames between existing ones.<n>We introduce AceVFI, the most comprehensive survey on VFI to date, covering over 250+ papers across these approaches.<n>We categorize the learning paradigm of VFI methods namely, Center-Time Frame Interpolation (CTFI) and Arbitrary-Time Frame Interpolation (ATFI)
arXiv Detail & Related papers (2025-06-01T16:01:24Z)
MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning [17.67073665165365]
We propose a concise and universal training framework, MultiTaskVIF, for segmentation-oriented VIF models.<n>In this framework, we introduce a multi-task head decoder (MTH) to simultaneously output both the fused image and the segmentation result during training.
arXiv Detail & Related papers (2025-05-10T14:47:19Z)
Rethinking the Evaluation of Visible and Infrared Image Fusion [39.53356881392218]
Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks. This paper proposes a semantic-oriented Evaluation Approach (SEA) to assess VIF methods.
arXiv Detail & Related papers (2024-10-09T12:12:08Z)
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models. Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework. Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z)
DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication [50.017055360261665]
We introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks. For better feature interaction between these two branches, we introduce two specialized modules. In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings.
arXiv Detail & Related papers (2024-02-03T06:49:42Z)
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models [7.428199805959228]
Few-shot semantic segmentation (FSS) is a crucial challenge in computer vision.<n>With the emergence of vision foundation models (VFM) as generalist feature extractors, we seek to explore the adaptation of these models for FSS.<n>We propose a novel realistic benchmark with a simple and straightforward adaptation process tailored for this task.
arXiv Detail & Related papers (2024-01-20T19:50:51Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA) IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors. IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.