Interpretable Solutions for Breast Cancer Diagnosis with Grammatical
Evolution and Data Augmentation
- URL: http://arxiv.org/abs/2401.14255v1
- Date: Thu, 25 Jan 2024 15:45:28 GMT
- Title: Interpretable Solutions for Breast Cancer Diagnosis with Grammatical
Evolution and Data Augmentation
- Authors: Yumnah Hasan, Allan de Lima, Fatemeh Amerehi, Darian Reyes Fernandez
de Bulnes, Patrick Healy, and Conor Ryan
- Abstract summary: We show how a new synthetic data generation technique, STEM, can be used to produce data to train models produced by Grammatical Evolution (GE)
We test our technique on the Digital Database for Screening Mammography (DDSM) and the Wisconsin Breast Cancer (WBC) datasets.
We demonstrate that the GE-derived models present the best AUC while still maintaining interpretable solutions.
- Score: 0.15705429611931054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical imaging diagnosis increasingly relies on Machine Learning (ML)
models. This is a task that is often hampered by severely imbalanced datasets,
where positive cases can be quite rare. Their use is further compromised by
their limited interpretability, which is becoming increasingly important. While
post-hoc interpretability techniques such as SHAP and LIME have been used with
some success on so-called black box models, the use of inherently
understandable models makes such endeavors more fruitful. This paper addresses
these issues by demonstrating how a relatively new synthetic data generation
technique, STEM, can be used to produce data to train models produced by
Grammatical Evolution (GE) that are inherently understandable. STEM is a
recently introduced combination of the Synthetic Minority Oversampling
Technique (SMOTE), Edited Nearest Neighbour (ENN), and Mixup; it has previously
been successfully used to tackle both between class and within class imbalance
issues. We test our technique on the Digital Database for Screening Mammography
(DDSM) and the Wisconsin Breast Cancer (WBC) datasets and compare Area Under
the Curve (AUC) results with an ensemble of the top three performing
classifiers from a set of eight standard ML classifiers with varying degrees of
interpretability. We demonstrate that the GE-derived models present the best
AUC while still maintaining interpretable solutions.
Related papers
- MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.
Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.
We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using
SMOTE, Edited Nearest Neighbour, and Mixup [0.20482269513546458]
Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases.
This paper investigates the potential of using Mixup augmentation to generate new data points as a generic vicinal distribution.
We focus on the breast cancer problem, where imbalanced datasets are prevalent.
arXiv Detail & Related papers (2023-11-13T17:45:28Z) - MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets.
MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes.
We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z) - SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced
Classification in Pathology [2.854576370929018]
Machine learning problems in medical imaging often deal with rare diseases.
In pathology images, there is another level of imbalance, where given a positively labeled Whole Slide Image (WSI), only a fraction of pixels within it contribute to the positive label.
We propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning.
arXiv Detail & Related papers (2023-03-23T16:28:15Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - METGAN: Generative Tumour Inpainting and Modality Synthesis in Light
Sheet Microscopy [4.872960046536882]
We introduce a novel generative method which leverages real anatomical information to generate realistic image-label pairs of tumours.
We construct a dual-pathway generator, for the anatomical image and label, trained in a cycle-consistent setup, constrained by an independent, pretrained segmentor.
The generated images yield significant quantitative improvement compared to existing methods.
arXiv Detail & Related papers (2021-04-22T11:18:17Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.