A Simple and Robust Framework for Cross-Modality Medical Image
Segmentation applied to Vision Transformers
- URL: http://arxiv.org/abs/2310.05572v1
- Date: Mon, 9 Oct 2023 09:51:44 GMT
- Title: A Simple and Robust Framework for Cross-Modality Medical Image
Segmentation applied to Vision Transformers
- Authors: Matteo Bastico, David Ryckelynck, Laurent Cort\'e, Yannick Tillier,
Etienne Decenci\`ere
- Abstract summary: We propose a simple framework to achieve fair image segmentation of multiple modalities using a single conditional model.
We show that our framework outperforms other cross-modality segmentation methods on the Multi-Modality Whole Heart Conditional Challenge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When it comes to clinical images, automatic segmentation has a wide variety
of applications and a considerable diversity of input domains, such as
different types of Magnetic Resonance Images (MRIs) and Computerized Tomography
(CT) scans. This heterogeneity is a challenge for cross-modality algorithms
that should equally perform independently of the input image type fed to them.
Often, segmentation models are trained using a single modality, preventing
generalization to other types of input data without resorting to transfer
learning techniques. Furthermore, the multi-modal or cross-modality
architectures proposed in the literature frequently require registered images,
which are not easy to collect in clinical environments, or need additional
processing steps, such as synthetic image generation. In this work, we propose
a simple framework to achieve fair image segmentation of multiple modalities
using a single conditional model that adapts its normalization layers based on
the input type, trained with non-registered interleaved mixed data. We show
that our framework outperforms other cross-modality segmentation methods, when
applied to the same 3D UNet baseline model, on the Multi-Modality Whole Heart
Segmentation Challenge. Furthermore, we define the Conditional Vision
Transformer (C-ViT) encoder, based on the proposed cross-modality framework,
and we show that it brings significant improvements to the resulting
segmentation, up to 6.87\% of Dice accuracy, with respect to its baseline
reference. The code to reproduce our experiments and the trained model weights
are available at https://github.com/matteo-bastico/MI-Seg.
Related papers
- MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training [10.558275557142137]
We propose a simple Multi-Modal (MulModSeg) strategy to enhance medical image segmentation across multiple modalities.
MulModSeg incorporates a modality-conditioned text embedding framework via a frozen text encoder.
It consistently outperforms previous methods in segmenting abdominal multiorgan and cardiac substructures for both CT and MR.
arXiv Detail & Related papers (2024-11-23T14:37:01Z) - Data Adaptive Few-shot Multi Label Segmentation with Foundation Model [0.0]
State-of-the-art methods for few-shot segmentation suffer from sub-optimal performance for medical images.
We propose foundation model (FM) based adapters for single label, multi-label localization and segmentation.
arXiv Detail & Related papers (2024-10-13T07:29:13Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Interpretable Small Training Set Image Segmentation Network Originated
from Multi-Grid Variational Model [5.283735137946097]
Deep learning (DL) methods have been proposed and widely used for image segmentation.
DL methods usually require a large amount of manually segmented data as training data and suffer from poor interpretability.
In this paper, we replace the hand-crafted regularity term in the MS model with a data adaptive generalized learnable regularity term.
arXiv Detail & Related papers (2023-06-25T02:34:34Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D
Multi-Modal Image Alignment of Large-scale Pathological CT Scans [27.180136688977512]
We propose a novel multi-task learning system, JSSR, based on an end-to-end 3D convolutional neural network.
The system is optimized to satisfy the implicit constraints between different tasks in an unsupervised manner.
It consistently outperforms conventional state-of-the-art multi-modal registration methods.
arXiv Detail & Related papers (2020-05-25T16:30:02Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.