DICE: Deep Significance Clustering for Outcome-Aware Stratification
- URL: http://arxiv.org/abs/2101.02344v1
- Date: Thu, 7 Jan 2021 03:06:52 GMT
- Title: DICE: Deep Significance Clustering for Outcome-Aware Stratification
- Authors: Yufang Huang, Kelly M. Axsom, John Lee, Lakshminarayanan Subramanian
and Yiye Zhang
- Abstract summary: Deep significance clustering (DICE) is a framework for jointly performing representation learning and clustering for "outcome-aware" stratification.
DICE has superior performance as measured by the difference in outcome distribution across clusters.
- Score: 9.634559881417077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present deep significance clustering (DICE), a framework for jointly
performing representation learning and clustering for "outcome-aware"
stratification. DICE is intended to generate cluster membership that may be
used to categorize a population by individual risk level for a targeted
outcome. Following the representation learning and clustering steps, we embed
the objective function in DICE with a constraint which requires a statistically
significant association between the outcome and cluster membership of learned
representations. DICE further includes a neural architecture search step to
maximize both the likelihood of representation learning and outcome
classification accuracy with cluster membership as the predictor. To
demonstrate its utility in medicine for patient risk-stratification, the
performance of DICE was evaluated using two datasets with different outcome
ratios extracted from real-world electronic health records. Outcomes are
defined as acute kidney injury (30.4\%) among a cohort of COVID-19 patients,
and discharge disposition (36.8\%) among a cohort of heart failure patients,
respectively. Extensive results demonstrate that DICE has superior performance
as measured by the difference in outcome distribution across clusters,
Silhouette score, Calinski-Harabasz index, and Davies-Bouldin index for
clustering, and Area under the ROC Curve (AUC) for outcome classification
compared to several baseline approaches.
Related papers
- Rethinking Divisive Hierarchical Clustering from a Distributional Perspective [7.023830532843621]
Divisive Hierarchical Clustering (DHC) methods produce a dendrogram that does not have three desired properties.<n>We show that this shortcoming can be addressed by using a distributional kernel, instead of the set-oriented criterion.<n>Our proposed method successfully creates a dendrogram that is consistent with the biological regions in a Spatial Transcriptomics dataset.
arXiv Detail & Related papers (2026-01-27T15:41:56Z) - Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery [5.669361767058639]
Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation.<n>We propose a novel framework that clusters individuals based on estimated treatment effects using a learned kernel derived from causal forests.
arXiv Detail & Related papers (2025-09-06T17:01:23Z) - Comparative analysis of unsupervised clustering techniques using validation metrics: Study on cognitive features from the Canadian Longitudinal Study on Aging (CLSA) [0.0]
The CLSA dataset includes 18,891 participants with data available at both baseline and follow-up assessments.
The clustering methodologies employed in this analysis are K-means (KM) clustering, Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM)
Using evaluation metrics to compare the results of the three clustering techniques, K-means and Partitioning Around Medoids (PAM) produced similar results.
arXiv Detail & Related papers (2025-04-07T21:13:51Z) - Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time.
With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering.
Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z) - A Self-Supervised Learning-based Approach to Clustering Multivariate
Time-Series Data with Missing Values (SLAC-Time): An Application to TBI
Phenotyping [8.487912181381404]
We present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time)
SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data.
Experiments show that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index.
arXiv Detail & Related papers (2023-02-27T01:05:17Z) - Non-parametric Clustering of Multivariate Populations with Arbitrary
Sizes [0.0]
We propose a clustering procedure to group K populations into subgroups with the same dependence structure.
We illustrate our clustering algorithm via numerical studies and through two real datasets.
arXiv Detail & Related papers (2022-11-11T16:38:29Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - The Group Loss++: A deeper look into group loss for deep metric learning [65.19665861268574]
Group Loss is a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group.
We show state-of-the-art results on clustering and image retrieval on four datasets, and present competitive results on two person re-identification datasets.
arXiv Detail & Related papers (2022-04-04T14:09:58Z) - Contrastive Fine-grained Class Clustering via Generative Adversarial
Networks [9.667133604169829]
We introduce C3-GAN, a method that leverages the categorical inference power of InfoGAN by applying contrastive learning.
C3-GAN achieved state-of-the-art clustering performance on four fine-grained benchmark datasets.
arXiv Detail & Related papers (2021-12-30T08:57:11Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep Semi-Supervised Embedded Clustering (DSEC) for Stratification of
Heart Failure Patients [50.48904066814385]
In this work we apply deep semi-supervised embedded clustering to determine data-driven patient subgroups of heart failure.
We find clinically relevant clusters from an embedded space derived from heterogeneous data.
The proposed algorithm can potentially find new undiagnosed subgroups of patients that have different outcomes.
arXiv Detail & Related papers (2020-12-24T12:56:46Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.