DRIVE: Machine Learning to Identify Drivers of Cancer with
High-Dimensional Genomic Data & Imputed Labels
- URL: http://arxiv.org/abs/2105.00469v1
- Date: Sun, 2 May 2021 13:27:31 GMT
- Title: DRIVE: Machine Learning to Identify Drivers of Cancer with
High-Dimensional Genomic Data & Imputed Labels
- Authors: Adnan Akbar, Andrey Solovyev, John W Cassidy, Nirmesh Patel, Harry W
Clifford
- Abstract summary: We propose a novel combination method for driver mutation identification.
It uses the power of both statistical modelling and functional-impact based methods.
Initial results show this approach outperforms the state-of-the-art methods in terms of precision.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the mutations that drive cancer growth is key in clinical
decision making and precision oncology. As driver mutations confer selective
advantage and thus have an increased likelihood of occurrence, frequency-based
statistical models are currently favoured. These methods are not suited to
rare, low frequency, driver mutations. The alternative approach to address this
is through functional-impact scores, however methods using this approach are
highly prone to false positives. In this paper, we propose a novel combination
method for driver mutation identification, which uses the power of both
statistical modelling and functional-impact based methods. Initial results show
this approach outperforms the state-of-the-art methods in terms of precision,
and provides comparable performance in terms of area under receiver operating
characteristic curves (AU-ROC). We believe that data-driven systems based on
machine learning, such as these, will become an integral part of precision
oncology in the near future.
Related papers
- Identifying actionable driver mutations in lung cancer using an efficient Asymmetric Transformer Decoder [9.503365381306963]
This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations.<n>We introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality.<n>Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF.
arXiv Detail & Related papers (2025-08-04T13:50:00Z) - Improving statistical learning methods via features selection without replacement sampling and random projection [0.680740878601496]
Cancer is a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression.<n>High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem.<n>This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
arXiv Detail & Related papers (2025-05-28T22:36:46Z) - Learning Penalty for Optimal Partitioning via Automatic Feature Extraction [0.0]
Changepoint detection identifies significant shifts in data sequences, making it important in areas like finance, genetics, and healthcare.<n>The Optimal Partitioning algorithms efficiently detect these changes, using a penalty parameter to limit the changepoints number.<n>This study proposes a novel approach that uses recurrent neural networks to learn this penalty directly from raw sequences by automatically extracting features.
arXiv Detail & Related papers (2025-05-12T10:07:55Z) - Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques [1.2302586529345994]
Heart disease remains a leading cause of morbidity and mortality worldwide.
We have developed a novel voting system with feature selection techniques to advance heart disease classification.
XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC.
arXiv Detail & Related papers (2025-04-01T07:16:49Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Effect sizes as a statistical feature-selector-based learning to detect breast cancer [0.0]
Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.
In this work, an algorithm and experimental results demonstrate the feasibility of developing a statistical feature-selector-based learning tool.
arXiv Detail & Related papers (2024-11-11T11:07:38Z) - Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.
We employ a neural network trained to identify causality through supervised learning on simulated data.
Empirical results demonstrate the effectiveness of our method in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI [0.0]
We evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods.
XGBoost achieved the best model accuracy, which is 97%.
arXiv Detail & Related papers (2024-04-06T17:23:21Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - DriPP: Driven Point Processes to Model Stimuli Induced Patterns in M/EEG
Signals [62.997667081978825]
We develop a novel statistical point process model-called driven temporal point processes (DriPP)
We derive a fast and principled expectation-maximization (EM) algorithm to estimate the parameters of this model.
Results on standard MEG datasets demonstrate that our methodology reveals event-related neural responses.
arXiv Detail & Related papers (2021-12-08T13:07:21Z) - Algorithmic encoding of protected characteristics and its implications
on disparities across subgroups [17.415882865534638]
Machine learning models may pick up undesirable correlations between a patient's racial identity and clinical outcome.
Very little is known about how these biases are encoded and how one may reduce or even remove disparate performance.
arXiv Detail & Related papers (2021-10-27T20:30:57Z) - Increased peak detection accuracy in over-dispersed ChIP-seq data with
supervised segmentation models [2.2559617939136505]
We show that unconstrained multiple changepoint detection model, with alternative noise assumptions and a suitable setup, reduces the over-dispersion exhibited by count data.
Results: We show that the unconstrained multiple changepoint detection model, with alternative noise assumptions and a suitable setup, reduces the over-dispersion exhibited by count data.
arXiv Detail & Related papers (2020-12-12T16:03:27Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z) - Bimodal Distribution Removal and Genetic Algorithm in Neural Network for
Breast Cancer Diagnosis [0.0]
This paper examines the effectiveness of Bimodal Distribution Removal (BDR) against the target cancer diagnosis classification problem.
BDR process in fact negatively impacts classification performance.
This paper also explores genetic algorithm as an efficient tool for feature selection.
arXiv Detail & Related papers (2020-02-20T13:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.