Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation
- URL: http://arxiv.org/abs/2308.02097v1
- Date: Fri, 4 Aug 2023 01:03:58 GMT
- Title: Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation
- Authors: Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong,
Zhongxuan Luo, Xin Fan
- Abstract summary: Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
- Score: 66.15246197473897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modality image fusion and segmentation play a vital role in autonomous
driving and robotic operation. Early efforts focus on boosting the performance
for only one task, \emph{e.g.,} fusion or segmentation, making it hard to
reach~`Best of Both Worlds'. To overcome this issue, in this paper, we propose
a \textbf{M}ulti-\textbf{i}nteractive \textbf{F}eature learning architecture
for image fusion and \textbf{Seg}mentation, namely SegMiF, and exploit
dual-task correlation to promote the performance of both tasks. The SegMiF is
of a cascade structure, containing a fusion sub-network and a commonly used
segmentation sub-network. By slickly bridging intermediate features between two
components, the knowledge learned from the segmentation task can effectively
assist the fusion task. Also, the benefited fusion network supports the
segmentation one to perform more pretentiously. Besides, a hierarchical
interactive attention block is established to ensure fine-grained mapping of
all the vital information between two tasks, so that the modality/semantic
features can be fully mutual-interactive. In addition, a dynamic weight factor
is introduced to automatically adjust the corresponding weights of each task,
which can balance the interactive feature correspondence and break through the
limitation of laborious tuning. Furthermore, we construct a smart multi-wave
binocular imaging system and collect a full-time multi-modality benchmark with
15 annotated pixel-level categories for image fusion and segmentation.
Extensive experiments on several public datasets and our benchmark demonstrate
that the proposed method outputs visually appealing fused images and perform
averagely $7.66\%$ higher segmentation mIoU in the real-world scene than the
state-of-the-art approaches. The source code and benchmark are available at
\url{https://github.com/JinyuanLiu-CV/SegMiF}.
Related papers
- A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images.
We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.
Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models.
We propose FocSAM with a pipeline redesigned on two pivotal aspects.
First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.
Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z) - TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven
Image Fusion Network [2.7387720378113554]
We introduce a target and semantic awareness-driven fusion network called TSJNet.
It comprises fusion, detection, and segmentationworks arranged in a series structure.
It can generate visually pleasing fused results, achieving an average increase of 2.84% and 7.47% in object detection and segmentation mAP @0.5 and mIoU, respectively.
arXiv Detail & Related papers (2024-02-02T08:37:38Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and
Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules.
We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z) - Few-shot Segmentation with Optimal Transport Matching and Message Flow [50.9853556696858]
It is essential for few-shot semantic segmentation to fully utilize the support information.
We propose a Correspondence Matching Network (CMNet) with an Optimal Transport Matching module.
Experiments on PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art few-shot segmentation performance.
arXiv Detail & Related papers (2021-08-19T06:26:11Z) - CMF: Cascaded Multi-model Fusion for Referring Image Segmentation [24.942658173937563]
We address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression.
We propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel.
Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods.
arXiv Detail & Related papers (2021-06-16T08:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.