Coupling Deep Imputation with Multitask Learning for Downstream Tasks on
Genomics Data
- URL: http://arxiv.org/abs/2204.13705v1
- Date: Thu, 28 Apr 2022 09:48:15 GMT
- Title: Coupling Deep Imputation with Multitask Learning for Downstream Tasks on
Genomics Data
- Authors: Sophie Peacock, Etai Jacob, Nikolay Burlutskiy
- Abstract summary: In this paper we investigate how imputing data with missing values using deep learning and multitask learning can help to reach state-of-the-art performance results.
We propose a generalised deep imputation method to impute values where a patient has all modalities present except one.
In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Genomics data such as RNA gene expression, methylation and micro RNA
expression are valuable sources of information for various clinical predictive
tasks. For example, predicting survival outcomes, cancer histology type and
other patients' related information is possible using not only clinical data
but molecular data as well. Moreover, using these data sources together, for
example in multitask learning, can boost the performance. However, in practice,
there are many missing data points which leads to significantly lower patient
numbers when analysing full cases, which in our setting refers to all
modalities being present.
In this paper we investigate how imputing data with missing values using deep
learning coupled with multitask learning can help to reach state-of-the-art
performance results using combined genomics modalities, RNA, micro RNA and
methylation. We propose a generalised deep imputation method to impute values
where a patient has all modalities present except one. Interestingly enough,
deep imputation alone outperforms multitask learning alone for the
classification and regression tasks across most combinations of modalities. In
contrast, when using all modalities for survival prediction we observe that
multitask learning alone outperforms deep imputation alone with statistical
significance (adjusted p-value 0.03). Thus, both approaches are complementary
when optimising performance for downstream predictive tasks.
Related papers
- Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data [0.0]
Real-life medical data is often multimodal and incomplete, fueling the need for advanced deep learning models.
We introduce DRIM, a new method for capturing shared and unique representations, despite data sparsity.
Our method outperforms state-of-the-art algorithms on glioma patients survival prediction tasks, while being robust to missing modalities.
arXiv Detail & Related papers (2024-09-25T16:13:57Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Time-dependent Iterative Imputation for Multivariate Longitudinal
Clinical Data [0.0]
Time-Dependent Iterative imputation offers a practical solution for imputing time-series data.
When applied to a cohort consisting of more than 500,000 patient observations, our approach outperformed state-of-the-art imputation methods.
arXiv Detail & Related papers (2023-04-16T16:10:49Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z) - TRAPDOOR: Repurposing backdoors to detect dataset bias in machine
learning-based genomic analysis [15.483078145498085]
Under-representation of groups in datasets can lead to inaccurate predictions for certain groups, which can exacerbate systemic discrimination issues.
We propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors.
Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially.
arXiv Detail & Related papers (2021-08-14T17:02:02Z) - MuCoMiD: A Multitask Convolutional Learning Framework for miRNA-Disease
Association Prediction [0.4061135251278187]
We propose a novel multi-tasking convolution-based approach, which we refer to as MuCoMiD.
MuCoMiD allows automatic feature extraction while incorporating knowledge from 4 heterogeneous biological information sources.
We construct large-scale experiments on standard benchmark datasets as well as our proposed larger independent test sets and case studies.
MuCoMiD shows an improvement of at least 5% in 5-fold CV evaluation on HMDDv2.0 and HMDDv3.0 datasets and at least 49% on larger independent test sets with unseen diseases and unseen diseases over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-08T10:01:46Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.