Federated unsupervised random forest for privacy-preserving patient
stratification
- URL: http://arxiv.org/abs/2401.16094v1
- Date: Mon, 29 Jan 2024 12:04:14 GMT
- Title: Federated unsupervised random forest for privacy-preserving patient
stratification
- Authors: Bastian Pfeifer, Christel Sirocchi, Marcus D. Bloice, Markus
Kreuzthaler, Martin Urschler
- Abstract summary: We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests.
We have validated our approach on machine learning benchmark data sets and on cancer data from The Cancer Genome Atlas.
Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability.
- Score: 0.4499833362998487
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the realm of precision medicine, effective patient stratification and
disease subtyping demand innovative methodologies tailored for multi-omics
data. Clustering techniques applied to multi-omics data have become
instrumental in identifying distinct subgroups of patients, enabling a
finer-grained understanding of disease variability. This work establishes a
powerful framework for advancing precision medicine through unsupervised
random-forest-based clustering and federated computing. We introduce a novel
multi-omics clustering approach utilizing unsupervised random-forests. The
unsupervised nature of the random forest enables the determination of
cluster-specific feature importance, unraveling key molecular contributors to
distinct patient groups. Moreover, our methodology is designed for federated
execution, a crucial aspect in the medical domain where privacy concerns are
paramount. We have validated our approach on machine learning benchmark data
sets as well as on cancer data from The Cancer Genome Atlas (TCGA). Our method
is competitive with the state-of-the-art in terms of disease subtyping, but at
the same time substantially improves the cluster interpretability. Experiments
indicate that local clustering performance can be improved through federated
computing.
Related papers
- Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Clustering individuals based on multivariate EMA time-series data [2.0824228840987447]
Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements.
Advanced machine learning (ML) methods are needed to understand data characteristics and uncover meaningful relationships regarding the underlying complex psychological processes.
arXiv Detail & Related papers (2022-12-02T13:33:36Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - Contrastive learning for unsupervised medical image clustering and
reconstruction [0.23624125155742057]
We propose an unsupervised autoencoder framework which is augmented with a contrastive loss to encourage high separability in the latent space.
Our method achieves similar performance to the supervised architecture, indicating that separation in the latent space reproduces expert medical observer-assigned labels.
arXiv Detail & Related papers (2022-09-24T13:17:02Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Cross-Cluster Weighted Forests [4.9873153106566575]
This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features.
We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm.
arXiv Detail & Related papers (2021-05-17T04:58:29Z) - Deep Semi-Supervised Embedded Clustering (DSEC) for Stratification of
Heart Failure Patients [50.48904066814385]
In this work we apply deep semi-supervised embedded clustering to determine data-driven patient subgroups of heart failure.
We find clinically relevant clusters from an embedded space derived from heterogeneous data.
The proposed algorithm can potentially find new undiagnosed subgroups of patients that have different outcomes.
arXiv Detail & Related papers (2020-12-24T12:56:46Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.