A Systematic Study of Deep Learning Models and xAI Methods for Region-of-Interest Detection in MRI Scans
- URL: http://arxiv.org/abs/2508.14151v2
- Date: Thu, 21 Aug 2025 08:09:44 GMT
- Title: A Systematic Study of Deep Learning Models and xAI Methods for Region-of-Interest Detection in MRI Scans
- Authors: Justin Yiu, Kushank Arora, Daniel Steinberg, Rohit Ghiya,
- Abstract summary: This study presents a systematic evaluation of various deep learning architectures combined with explainable AI (xAI) techniques for automated region of interest detection in knee MRI scans.<n>We investigate both supervised and self-supervised approaches, including ResNet50, InceptionV3, Vision Transformers (ViT), and multiple U-Net variants augmented with multi-layer perceptron (MLP) classifiers.<n>Our results demonstrate that ResNet50 consistently excels in classification and ROI identification, outperforming transformer-based models under the constraints of the MRNet dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Magnetic Resonance Imaging (MRI) is an essential diagnostic tool for assessing knee injuries. However, manual interpretation of MRI slices remains time-consuming and prone to inter-observer variability. This study presents a systematic evaluation of various deep learning architectures combined with explainable AI (xAI) techniques for automated region of interest (ROI) detection in knee MRI scans. We investigate both supervised and self-supervised approaches, including ResNet50, InceptionV3, Vision Transformers (ViT), and multiple U-Net variants augmented with multi-layer perceptron (MLP) classifiers. To enhance interpretability and clinical relevance, we integrate xAI methods such as Grad-CAM and Saliency Maps. Model performance is assessed using AUC for classification and PSNR/SSIM for reconstruction quality, along with qualitative ROI visualizations. Our results demonstrate that ResNet50 consistently excels in classification and ROI identification, outperforming transformer-based models under the constraints of the MRNet dataset. While hybrid U-Net + MLP approaches show potential for leveraging spatial features in reconstruction and interpretability, their classification performance remains lower. Grad-CAM consistently provided the most clinically meaningful explanations across architectures. Overall, CNN-based transfer learning emerges as the most effective approach for this dataset, while future work with larger-scale pretraining may better unlock the potential of transformer models.
Related papers
- Seeing Isn't Always Believing: Analysis of Grad-CAM Faithfulness and Localization Reliability in Lung Cancer CT Classification [0.0]
This study investigates whether Grad-CAM truly represents the internal decision-making of deep models trained for lung cancer image classification.<n>We introduce a quantitative evaluation framework that combines localization accuracy, perturbation-based faithfulness, and explanation consistency to assess Grad-CAM reliability.<n>Our findings aim to inspire a more cautious and rigorous adoption of visual explanation tools in medical AI, urging the community to rethink what it truly means to "trust" a model's explanation.
arXiv Detail & Related papers (2026-01-19T08:35:59Z) - Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - Meta knowledge assisted Evolutionary Neural Architecture Search [38.55611683982936]
This paper introduces an efficient EC-based NAS method to solve problems via an innovative meta-learning framework.<n>An adaptive surrogate model is designed through an adaptive threshold to select the potential architectures.<n> Experiments on CIFAR-10, CIFAR-100, and ImageNet1K datasets demonstrate that the proposed method achieves high performance comparable to that of many state-of-the-art peer methods.
arXiv Detail & Related papers (2025-04-30T11:43:07Z) - Multi-Scale Transformer Architecture for Accurate Medical Image Classification [4.578375402082224]
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture.<n>By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features.<n>Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models.
arXiv Detail & Related papers (2025-02-10T08:22:25Z) - Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment [45.62136459502005]
We propose a network to perform full reference (FR) and no reference (NR) IQA.
We first employ an encoder to extract multi-level features from input images.
A Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs.
A Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder.
arXiv Detail & Related papers (2023-10-14T11:03:04Z) - Convolutional neural network based on sparse graph attention mechanism
for MRI super-resolution [0.34410212782758043]
Medical image super-resolution (SR) reconstruction using deep learning techniques can enhance lesion analysis and assist doctors in improving diagnostic efficiency and accuracy.
Existing deep learning-based SR methods rely on convolutional neural networks (CNNs), which inherently limit the expressive capabilities of these models.
We propose an A-network that utilizes multiple convolution operator feature extraction modules (MCO) for extracting image features.
arXiv Detail & Related papers (2023-05-29T06:14:22Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - CNN-LSTM Based Multimodal MRI and Clinical Data Fusion for Predicting
Functional Outcome in Stroke Patients [1.5250925845050138]
Clinical outcome prediction plays an important role in stroke patient management.
From a machine learning point-of-view, one of the main challenges is dealing with heterogeneous data.
In this paper a multimodal convolutional neural network - long short-term memory (CNN-LSTM) based ensemble model is proposed.
arXiv Detail & Related papers (2022-05-11T14:46:01Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Domain Shift in Computer Vision models for MRI data analysis: An
Overview [64.69150970967524]
Machine learning and computer vision methods are showing good performance in medical imagery analysis.
Yet only a few applications are now in clinical use.
Poor transferability of themodels to data from different sources or acquisition domains is one of the reasons for that.
arXiv Detail & Related papers (2020-10-14T16:34:21Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.