TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven
Image Fusion Network
- URL: http://arxiv.org/abs/2402.01212v1
- Date: Fri, 2 Feb 2024 08:37:38 GMT
- Title: TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven
Image Fusion Network
- Authors: Yuchan Jie, Yushen Xu, Xiaosong Li, Haishu Tan
- Abstract summary: We introduce a target and semantic awareness-driven fusion network called TSJNet.
It comprises fusion, detection, and segmentationworks arranged in a series structure.
It can generate visually pleasing fused results, achieving an average increase of 2.84% and 7.47% in object detection and segmentation mAP @0.5 and mIoU, respectively.
- Score: 2.7387720378113554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modality image fusion involves integrating complementary information
from different modalities into a single image. Current methods primarily focus
on enhancing image fusion with a single advanced task such as incorporating
semantic or object-related information into the fusion process. This method
creates challenges in achieving multiple objectives simultaneously. We
introduce a target and semantic awareness joint-driven fusion network called
TSJNet. TSJNet comprises fusion, detection, and segmentation subnetworks
arranged in a series structure. It leverages object and semantically relevant
information derived from dual high-level tasks to guide the fusion network.
Additionally, We propose a local significant feature extraction module with a
double parallel branch structure to fully capture the fine-grained features of
cross-modal images and foster interaction among modalities, targets, and
segmentation information. We conducted extensive experiments on four publicly
available datasets (MSRS, M3FD, RoadScene, and LLVIP). The results demonstrate
that TSJNet can generate visually pleasing fused results, achieving an average
increase of 2.84% and 7.47% in object detection and segmentation mAP @0.5 and
mIoU, respectively, compared to the state-of-the-art methods.
Related papers
- SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything [14.937761564543239]
We propose a two-stage refinement framework that fully integrates images and prompts globally and locally.
The first-stage GlobalDiff Refiner is a lightweight early fusion network that combines the whole image and prompts.
The second-stage PatchDiff Refiner locates the object detail window according to the mask and prompts, then refines the local details of the object.
arXiv Detail & Related papers (2024-08-21T11:18:35Z) - A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images.
We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.
Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.