Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification
- URL: http://arxiv.org/abs/2410.10146v1
- Date: Mon, 14 Oct 2024 04:22:24 GMT
- Title: Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification
- Authors: Sadam Hussain, Mansoor Ali, Usman Naseem, Beatriz Alejandra Bosques Palomo, Mario Alexis Monsivais Molina, Jorge Alberto Garza Abdala, Daly Betzabeth Avendano Avalos, Servando Cardona-Huerta, T. Aaron Gulliver, Jose Gerardo Tamez Pena,
- Abstract summary: Rising breast cancer (BC) occurrence and mortality are major global concerns for women.
Deep learning (DL) has demonstrated superior diagnostic performance in BC classification compared to human expert readers.
This study proposes a multimodal DL architecture for BC classification, utilising images (mammograms; four views) and textual data (radiological reports) from our new in-house dataset.
- Score: 7.405837346783951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rising breast cancer (BC) occurrence and mortality are major global concerns for women. Deep learning (DL) has demonstrated superior diagnostic performance in BC classification compared to human expert readers. However, the predominant use of unimodal (digital mammography) features may limit the current performance of diagnostic models. To address this, we collected a novel multimodal dataset comprising both imaging and textual data. This study proposes a multimodal DL architecture for BC classification, utilising images (mammograms; four views) and textual data (radiological reports) from our new in-house dataset. Various augmentation techniques were applied to enhance the training data size for both imaging and textual data. We explored the performance of eleven SOTA DL architectures (VGG16, VGG19, ResNet34, ResNet50, MobileNet-v3, EffNet-b0, EffNet-b1, EffNet-b2, EffNet-b3, EffNet-b7, and Vision Transformer (ViT)) as imaging feature extractors. For textual feature extraction, we utilised either artificial neural networks (ANNs) or long short-term memory (LSTM) networks. The combined imaging and textual features were then inputted into an ANN classifier for BC classification, using the late fusion technique. We evaluated different feature extractor and classifier arrangements. The VGG19 and ANN combinations achieved the highest accuracy of 0.951. For precision, the VGG19 and ANN combination again surpassed other CNN and LSTM, ANN based architectures by achieving a score of 0.95. The best sensitivity score of 0.903 was achieved by the VGG16+LSTM. The highest F1 score of 0.931 was achieved by VGG19+LSTM. Only the VGG16+LSTM achieved the best area under the curve (AUC) of 0.937, with VGG16+LSTM closely following with a 0.929 AUC score.
Related papers
- Deep Modeling and Optimization of Medical Image Classification [5.195343321287341]
We introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer.<n>We involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data.
arXiv Detail & Related papers (2025-05-29T03:27:51Z) - Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - Brain Tumor Radiogenomic Classification [1.8276368987462532]
The RSNA-MICCAI brain tumor radiogenomic classification challenge aimed to predict MGMT biomarker status in glioblastoma through binary classification.
The dataset is splitted into three main cohorts: training set, validation set which were used during training, and the testing were only used during final evaluation.
arXiv Detail & Related papers (2024-01-11T10:30:09Z) - WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep
Imaging Ultrasound [1.2903292694072621]
Volume sweep imaging (VSI) is an innovative approach that enables untrained operators to capture quality ultrasound images.
We present a novel segmentation model known as Wavelet_Attention_UNet (WATUNet)
In this model, we incorporate wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder instead of a simple connection to overcome the limitations mentioned.
arXiv Detail & Related papers (2023-11-17T20:32:37Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Combination of multiple neural networks using transfer learning and
extensive geometric data augmentation for assessing cellularity scores in
histopathology images [0.0]
This work investigates the capabilities of two Deep Learning approaches to assess cancer cellularity in slide images.
The effects of training on augmented data via rotations, and combinations of multiple architectures into a single network were analyzed.
An additional architecture consisting of the InceptionV3 network and VGG16, a shallow, transfer learned CNN, was combined in a parallel architecture.
arXiv Detail & Related papers (2022-11-09T04:29:15Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - MARL: Multimodal Attentional Representation Learning for Disease
Prediction [0.0]
Existing learning models often utilise CT-scan images to predict lung diseases.
These models are posed by high uncertainties that affect lung segmentation and visual feature learning.
We introduce MARL, a novel Multimodal Attentional Representation Learning model architecture.
arXiv Detail & Related papers (2021-05-01T17:47:40Z) - Automated Model Design and Benchmarking of 3D Deep Learning Models for
COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification.
We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.