Related papers: Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation

URL: http://arxiv.org/abs/2309.06618v3
Date: Fri, 15 Dec 2023 23:54:04 GMT
Title: Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation
Authors: Yixing Lu, Zhaoxin Fan, Min Xu
Abstract summary: We introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. We propose the Multi-Axis Consistency framework for generating robust pseudo labels.
Score: 10.628250457432499
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. This scheme adeptly combines the strengths of both ViTs and CNNs, capitalizing on the unique advantages of both architectures as well as the complementary information in vision-language modalities. Further enriching our framework, we propose the Multi-Axis Consistency framework for generating robust pseudo labels, thereby enhancing the semisupervised learning process. Our extensive experiments on several widelyused datasets unequivocally demonstrate the efficacy of our approach.

Related papers

A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation [12.948027961485536]
We propose a novel Weakly Supervised Semantic (WSSS) approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels. Our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.
arXiv Detail & Related papers (2024-11-19T16:20:27Z)
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples. We introduce a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. We propose a simple yet effective Test-time Adaptive Cross-modal (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z)
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model [49.931663904599205]
MaVEn is an innovative framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning. We show that MaVEn significantly enhances MLLMs' understanding in complex multi-image scenarios, while also improving performance in single-image contexts.
arXiv Detail & Related papers (2024-08-22T11:57:16Z)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images. In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS) Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z)
Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation [14.536384387956527]
We develop a novel Multi-Scale Cross Supervised Contrastive Learning framework to segment structures in medical images. Our approach contrasts multi-scale features based on ground-truth and cross-predicted labels, in order to extract robust feature representations. It outperforms state-of-the-art semi-supervised methods by more than 3.0% in Dice.
arXiv Detail & Related papers (2023-06-25T16:55:32Z)
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts [12.632808712127291]
multimodal named entity recognition (MNER) has attracted more attention. We propose a novel approach, which can dynamically align the image and text sequence. We conduct experiments on two open datasets, and the results and detailed analysis demonstrate the advantage of our model.
arXiv Detail & Related papers (2023-05-15T06:14:36Z)
When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation [13.911947592067678]
In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented. Our framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes. Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set.
arXiv Detail & Related papers (2022-08-12T18:21:22Z)
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network. A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features. The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z)
Embedded Deep Bilinear Interactive Information and Selective Fusion for Multi-view Learning [70.67092105994598]
We propose a novel multi-view learning framework to make the multi-view classification better aimed at the above-mentioned two aspects. In particular, we train different deep neural networks to learn various intra-view representations. Experiments on six publicly available datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-13T01:13:23Z)
Learning the Compositional Visual Coherence for Complementary Recommendations [62.60648815930101]
Complementary recommendations aim at providing users product suggestions that are supplementary and compatible with their obtained items. We propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.
arXiv Detail & Related papers (2020-06-08T06:57:18Z)
Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI. We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.