DNA Methylation Data to Predict Suicidal and Non-Suicidal Deaths: A
Machine Learning Approach
- URL: http://arxiv.org/abs/2004.01819v1
- Date: Sat, 4 Apr 2020 00:34:22 GMT
- Title: DNA Methylation Data to Predict Suicidal and Non-Suicidal Deaths: A
Machine Learning Approach
- Authors: Rifat Zahan, Ian McQuillan and Nathaniel D. Osgood
- Abstract summary: The objective of this study is to predict suicidal and non-suicidal deaths from DNA methylation data using a modern machine learning algorithm.
We used support vector machines to classify existing secondary data consisting of normalized values of methylated DNA probe intensities.
Despite the use of cross-validation, the nominally perfect prediction of suicidal deaths for BA11 data suggests possible over-fitting of the model.
- Score: 1.2891210250935146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of this study is to predict suicidal and non-suicidal deaths
from DNA methylation data using a modern machine learning algorithm. We used
support vector machines to classify existing secondary data consisting of
normalized values of methylated DNA probe intensities from tissues of two
cortical brain regions to distinguish suicide cases from control cases. Before
classification, we employed Principal component analysis (PCA) and
t-distributed Stochastic Neighbor Embedding (t-SNE) to reduce the dimension of
the data. In comparison to PCA, the modern data visualization method t-SNE
performs better in dimensionality reduction. t-SNE accounts for the possible
non-linear patterns in low-dimensional data. We applied four-fold
cross-validation in which the resulting output from t-SNE was used as training
data for the Support Vector Machine (SVM). Despite the use of cross-validation,
the nominally perfect prediction of suicidal deaths for BA11 data suggests
possible over-fitting of the model. The study also may have suffered from
'spectrum bias' since the individuals were only studied from two extreme
scenarios. This research constitutes a baseline study for classifying suicidal
and non-suicidal deaths from DNA methylation data. Future studies with larger
sample size, while possibly incorporating methylation data from living
individuals, may reduce the bias and improve the accuracy of the results.
Related papers
- SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data [7.090283934070421]
Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia.
Conventional approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis.
We propose a novel staged graph convolutional network with uncertainty quantification (SGUQ)
arXiv Detail & Related papers (2024-10-14T19:51:32Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Ischemic Stroke Lesion Prediction using imbalanced Temporal Deep
Gaussian Process (iTDGP) [2.649401887836554]
Acute Ischemic Stroke (AIS) occurs when the blood supply to the brain is suddenly interrupted because of a blocked artery.
Current standard AIS assessment method, which thresholds the 3D measurement maps extracted from Computed Tomography Perfusion (CTP) images, is not accurate enough.
We propose imbalanced Temporal Deep Process (iTDGP), a probabilistic model that can improve AIS prediction by using baseline Gaussian time series.
arXiv Detail & Related papers (2022-11-16T17:32:29Z) - Compensating trajectory bias for unsupervised patient stratification
using adversarial recurrent neural networks [0.6323908398583082]
We show that patient embeddings and clusters might be impacted by a trajectory bias.
Results are dominated by the amount of data contained in each patients trajectory, instead of clinically relevant details.
We present a method that can overcome this issue using an adversarial training scheme on top of a RNN-AE.
arXiv Detail & Related papers (2021-12-14T09:01:28Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Quantification of pulmonary involvement in COVID-19 pneumonia by means
of a cascade oftwo U-nets: training and assessment on multipledatasets using
different annotation criteria [83.83783947027392]
This study aims at exploiting Artificial intelligence (AI) for the identification, segmentation and quantification of COVID-19 pulmonary lesions.
We developed an automated analysis pipeline, the LungQuant system, based on a cascade of two U-nets.
The accuracy in predicting the CT-Severity Score (CT-SS) of the LungQuant system has been also evaluated.
arXiv Detail & Related papers (2021-05-06T10:21:28Z) - Federated Deep AUC Maximization for Heterogeneous Data with a Constant
Communication Complexity [77.78624443410216]
We propose improved FDAM algorithms for detecting heterogeneous chest data.
A result of this paper is that the communication of the proposed algorithm is strongly independent of the number of machines and also independent of the accuracy level.
Experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets and on medical chest Xray images from different organizations.
arXiv Detail & Related papers (2021-02-09T04:05:19Z) - Handling Non-ignorably Missing Features in Electronic Health Records
Data Using Importance-Weighted Autoencoders [8.518166245293703]
We propose a novel extension of VAEs called Importance-Weighted Autoencoders (IWAEs) to flexibly handle Missing Not At Random patterns in the Physionet data.
Our proposed method models the missingness mechanism using an embedded neural network, eliminating the need to specify the exact form of the missingness mechanism a priori.
arXiv Detail & Related papers (2021-01-18T22:53:29Z) - DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting
Time-varying Confounding in Modern Longitudinal Observational Data [68.29870617697532]
We propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) for time-varying confounding in longitudinal data.
DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data.
arXiv Detail & Related papers (2020-10-28T15:05:08Z) - Machine Learning and Data Science approach towards trend and predictors
analysis of CDC Mortality Data for the USA [0.0]
The study concluded (based on a sample) life expectancy regardless of gender, and their central tendencies; Marital status of the people also affected how frequent deaths were for each of them.
The study shows that machine learning predictions aren't as viable for the data as it might be apparent.
arXiv Detail & Related papers (2020-09-11T12:46:57Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.