Predicting Survival Outcomes in the Presence of Unlabeled Data
- URL: http://arxiv.org/abs/2210.13891v1
- Date: Tue, 25 Oct 2022 10:19:45 GMT
- Title: Predicting Survival Outcomes in the Presence of Unlabeled Data
- Authors: Fateme Nateghi Haredasht, Celine Vens
- Abstract summary: We investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times.
We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many clinical studies require the follow-up of patients over time. This is
challenging: apart from frequently observed drop-out, there are often also
organizational and financial challenges, which can lead to reduced data
collection and, in turn, can complicate subsequent analyses. In contrast, there
is often plenty of baseline data available of patients with similar
characteristics and background information, e.g., from patients that fall
outside the study time window. In this article, we investigate whether we can
benefit from the inclusion of such unlabeled data instances to predict accurate
survival times. In other words, we introduce a third level of supervision in
the context of survival analysis, apart from fully observed and censored
instances, we also include unlabeled instances. We propose three approaches to
deal with this novel setting and provide an empirical comparison over fifteen
real-life clinical and gene expression survival datasets. Our results
demonstrate that all approaches are able to increase the predictive performance
over independent test data. We also show that integrating the partial
supervision provided by censored data in a semi-supervised wrapper approach
generally provides the best results, often achieving high improvements,
compared to not using unlabeled data.
Related papers
- Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations [19.560652381770243]
We introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels.
Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space.
The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets.
arXiv Detail & Related papers (2024-07-25T02:55:39Z) - SurvTimeSurvival: Survival Analysis On The Patient With Multiple
Visits/Records [26.66492761632773]
The accurate prediction of survival times for patients with severe diseases remains a critical challenge despite recent advances in artificial intelligence.
This study introduces "SurvTimeSurvival: Survival Analysis On Patients With Multiple Visits/Records"
arXiv Detail & Related papers (2023-11-16T12:30:14Z) - Contrastive Learning of Temporal Distinctiveness for Survival Analysis
in Electronic Health Records [10.192973297290136]
We propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework.
OTCSurv uses survival durations from both censored and observed data to define temporal distinctiveness.
We conduct experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI)
arXiv Detail & Related papers (2023-08-24T22:36:22Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Leveraging Unlabelled Data in Multiple-Instance Learning Problems for
Improved Detection of Parkinsonian Tremor in Free-Living Conditions [80.88681952022479]
We introduce a new method for combining semi-supervised with multiple-instance learning.
We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains in per-subject tremor detection.
arXiv Detail & Related papers (2023-04-29T12:25:10Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of
Flaws and Benefits when Applying Over-sampling [13.463035357173045]
We focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets.
We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified.
arXiv Detail & Related papers (2020-01-15T12:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.