Medical Slice Transformer: Improved Diagnosis and Explainability on 3D Medical Images with DINOv2
- URL: http://arxiv.org/abs/2411.15802v1
- Date: Sun, 24 Nov 2024 12:11:11 GMT
- Title: Medical Slice Transformer: Improved Diagnosis and Explainability on 3D Medical Images with DINOv2
- Authors: Gustav Müller-Franzes, Firas Khader, Robert Siepmann, Tianyu Han, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn,
- Abstract summary: We introduce the Medical Slice Transformer (MST) framework to adapt 2D self-supervised models for 3D medical image analysis.
MST offers enhanced diagnostic accuracy and explainability compared to convolutional neural networks.
- Score: 1.6275928583134276
- License:
- Abstract: MRI and CT are essential clinical cross-sectional imaging techniques for diagnosing complex conditions. However, large 3D datasets with annotations for deep learning are scarce. While methods like DINOv2 are encouraging for 2D image analysis, these methods have not been applied to 3D medical images. Furthermore, deep learning models often lack explainability due to their "black-box" nature. This study aims to extend 2D self-supervised models, specifically DINOv2, to 3D medical imaging while evaluating their potential for explainable outcomes. We introduce the Medical Slice Transformer (MST) framework to adapt 2D self-supervised models for 3D medical image analysis. MST combines a Transformer architecture with a 2D feature extractor, i.e., DINOv2. We evaluate its diagnostic performance against a 3D convolutional neural network (3D ResNet) across three clinical datasets: breast MRI (651 patients), chest CT (722 patients), and knee MRI (1199 patients). Both methods were tested for diagnosing breast cancer, predicting lung nodule dignity, and detecting meniscus tears. Diagnostic performance was assessed by calculating the Area Under the Receiver Operating Characteristic Curve (AUC). Explainability was evaluated through a radiologist's qualitative comparison of saliency maps based on slice and lesion correctness. P-values were calculated using Delong's test. MST achieved higher AUC values compared to ResNet across all three datasets: breast (0.94$\pm$0.01 vs. 0.91$\pm$0.02, P=0.02), chest (0.95$\pm$0.01 vs. 0.92$\pm$0.02, P=0.13), and knee (0.85$\pm$0.04 vs. 0.69$\pm$0.05, P=0.001). Saliency maps were consistently more precise and anatomically correct for MST than for ResNet. Self-supervised 2D models like DINOv2 can be effectively adapted for 3D medical imaging using MST, offering enhanced diagnostic accuracy and explainability compared to convolutional neural networks.
Related papers
- Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - 2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks [0.0]
This study applies Convolutional Kolmogorov-Arnold Networks (ConvKANs) to Parkinson's Disease diagnosis.
ConvKANs integrate learnable activation functions into convolutional layers, for PD classification using structural MRI.
The first 3D implementation of ConvKANs for medical imaging is presented, comparing their performance to Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs)
These findings highlight ConvKANs' potential for PD detection, emphasize the importance of 3D analysis in capturing subtle brain changes, and underscore cross-dataset generalization challenges.
arXiv Detail & Related papers (2024-07-24T16:04:18Z) - Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model [3.4248731707266264]
In neuroimaging, generally, brain CT is more cost-effective and accessible than MRI.
Medical image-to-image translation (I2I) serves as a promising solution.
This study is the first to achieve high-quality 3D medical I2I based only on a 2D DM with no extra architectural models.
arXiv Detail & Related papers (2024-07-06T12:13:36Z) - CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images [10.538839084727975]
Tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane.
In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose.
We propose a dual-models framework built with hierarchical ViT blocks to reconstruct 3D CT from kV images obtained at the treatment position.
arXiv Detail & Related papers (2024-04-01T19:55:03Z) - 3D-GMIC: an efficient deep neural network to find small objects in large
3D images [41.334361182700164]
3D imaging enables a more accurate diagnosis by providing spatial information about organ anatomy.
Using 3D images to train AI models is computationally challenging because they consist of tens or hundreds of times more pixels than their 2D counterparts.
We propose a novel neural network architecture that enables computationally efficient classification of 3D medical images in their full resolution.
arXiv Detail & Related papers (2022-10-16T21:58:54Z) - Moving from 2D to 3D: volumetric medical image classification for rectal
cancer staging [62.346649719614]
preoperative discrimination between T2 and T3 stages is arguably both the most challenging and clinically significant task for rectal cancer treatment.
We present a volumetric convolutional neural network to accurately discriminate T2 from T3 stage rectal cancer with rectal MR volumes.
arXiv Detail & Related papers (2022-09-13T07:10:14Z) - 3-Dimensional Deep Learning with Spatial Erasing for Unsupervised
Anomaly Segmentation in Brain MRI [55.97060983868787]
We investigate whether using increased spatial context by using MRI volumes combined with spatial erasing leads to improved unsupervised anomaly segmentation performance.
We compare 2D variational autoencoder (VAE) to their 3D counterpart, propose 3D input erasing, and systemically study the impact of the data set size on the performance.
Our best performing 3D VAE with input erasing leads to an average DICE score of 31.40% compared to 25.76% for the 2D VAE.
arXiv Detail & Related papers (2021-09-14T09:17:27Z) - 3D RegNet: Deep Learning Model for COVID-19 Diagnosis on Chest CT Image [9.407002591446286]
A 3D-RegNet-based neural network is proposed for diagnosing the physical condition of patients with coronavirus (Covid-19) infection.
The results show that the test set of the 3D model, the result: f1 score of 0.8379 and AUC value of 0.8807 have been achieved.
arXiv Detail & Related papers (2021-07-08T18:10:07Z) - Automated Model Design and Benchmarking of 3D Deep Learning Models for
COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification.
We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.