Heterogeneous Datasets for Federated Survival Analysis Simulation
- URL: http://arxiv.org/abs/2301.12166v1
- Date: Sat, 28 Jan 2023 11:37:07 GMT
- Title: Heterogeneous Datasets for Federated Survival Analysis Simulation
- Authors: Alberto Archetti, Eugenio Lomurno, Francesco Lattari, Andr\'e Martin,
Matteo Matteucci
- Abstract summary: This work proposes a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way.
Specifically, we provide two novel dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client.
The implementation of the proposed methods is publicly available in favor of and to encourage common practices to simulate federated environments for survival analysis.
- Score: 6.489759672413373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Survival analysis studies time-modeling techniques for an event of interest
occurring for a population. Survival analysis found widespread applications in
healthcare, engineering, and social sciences. However, the data needed to train
survival models are often distributed, incomplete, censored, and confidential.
In this context, federated learning can be exploited to tremendously improve
the quality of the models trained on distributed data while preserving user
privacy. However, federated survival analysis is still in its early
development, and there is no common benchmarking dataset to test federated
survival models. This work proposes a novel technique for constructing
realistic heterogeneous datasets by starting from existing non-federated
datasets in a reproducible way. Specifically, we provide two novel
dataset-splitting algorithms based on the Dirichlet distribution to assign each
data sample to a carefully chosen client: quantity-skewed splitting and
label-skewed splitting. Furthermore, these algorithms allow for obtaining
different levels of heterogeneity by changing a single hyperparameter. Finally,
numerical experiments provide a quantitative evaluation of the heterogeneity
level using log-rank tests and a qualitative analysis of the generated splits.
The implementation of the proposed methods is publicly available in favor of
reproducibility and to encourage common practices to simulate federated
environments for survival analysis.
Related papers
- Adaptive Transformer Modelling of Density Function for Nonparametric Survival Analysis [11.35395323124404]
Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare.
We propose a novel survival regression method capable of producing high-quality unimodal PDFs without any prior distribution assumption.
arXiv Detail & Related papers (2024-09-10T04:29:59Z) - Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations [19.560652381770243]
We introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels.
Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space.
The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets.
arXiv Detail & Related papers (2024-07-25T02:55:39Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Synthetic data generation for a longitudinal cohort study -- Evaluation,
method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns.
A promising alternative is the generation of fully synthetic data.
In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.