A Self-Supervised Learning-based Approach to Clustering Multivariate
Time-Series Data with Missing Values (SLAC-Time): An Application to TBI
Phenotyping
- URL: http://arxiv.org/abs/2302.13457v2
- Date: Sat, 27 May 2023 20:21:49 GMT
- Title: A Self-Supervised Learning-based Approach to Clustering Multivariate
Time-Series Data with Missing Values (SLAC-Time): An Application to TBI
Phenotyping
- Authors: Hamid Ghaderi, Brandon Foreman, Amin Nayebi, Sindhu Tipirneni, Chandan
K. Reddy, Vignesh Subbian
- Abstract summary: We present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time)
SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data.
Experiments show that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index.
- Score: 8.487912181381404
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning approaches provide a promising direction for
clustering multivariate time-series data. However, real-world time-series data
often include missing values, and the existing approaches require imputing
missing values before clustering, which may cause extensive computations and
noise and result in invalid interpretations. To address these challenges, we
present a Self-supervised Learning-based Approach to Clustering multivariate
Time-series data with missing values (SLAC-Time). SLAC-Time is a
Transformer-based clustering method that uses time-series forecasting as a
proxy task for leveraging unlabeled data and learning more robust time-series
representations. This method jointly learns the neural network parameters and
the cluster assignments of the learned representations. It iteratively clusters
the learned representations with the K-means method and then utilizes the
subsequent cluster assignments as pseudo-labels to update the model parameters.
To evaluate our proposed approach, we applied it to clustering and phenotyping
Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical
Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Our experiments
demonstrate that SLAC-Time outperforms the baseline K-means clustering
algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn
index, and Davies Bouldin index. We identified three TBI phenotypes that are
distinct from one another in terms of clinically significant variables as well
as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE)
score, Intensive Care Unit (ICU) length of stay, and mortality rate. The
experiments show that the TBI phenotypes identified by SLAC-Time can be
potentially used for developing targeted clinical trials and therapeutic
strategies.
Related papers
- Concrete Dense Network for Long-Sequence Time Series Clustering [4.307648859471193]
Time series clustering is fundamental in data analysis for discovering temporal patterns.
Deep temporal clustering methods have been trying to integrate the canonical k-means into end-to-end training of neural networks.
LoSTer is a novel dense autoencoder architecture for the long-sequence time series clustering problem.
arXiv Detail & Related papers (2024-05-08T12:31:35Z) - Identifying TBI Physiological States by Clustering Multivariate Clinical
Time-Series Data [8.487912181381404]
SLAC-Time is an innovative self-supervision-based approach that maintains data integrity by avoiding imputation or aggregation.
By using SLAC-Time to cluster data in a large research dataset, we identified three distinct TBI physiological states.
arXiv Detail & Related papers (2023-03-23T04:16:00Z) - Time Associated Meta Learning for Clinical Prediction [78.99422473394029]
We propose a novel time associated meta learning (TAML) method to make effective predictions at multiple future time points.
To address the sparsity problem after task splitting, TAML employs a temporal information sharing strategy to augment the number of positive samples.
We demonstrate the effectiveness of TAML on multiple clinical datasets, where it consistently outperforms a range of strong baselines.
arXiv Detail & Related papers (2023-03-05T03:54:54Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - Deep Temporal Contrastive Clustering [21.660509622172274]
This paper presents a deep temporal contrastive clustering approach.
It incorporates the contrastive learning paradigm into the deep time series clustering research.
Experiments on a variety of time series datasets demonstrate the superiority of our approach over the state-of-the-art.
arXiv Detail & Related papers (2022-12-29T16:43:34Z) - Clustering individuals based on multivariate EMA time-series data [2.0824228840987447]
Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements.
Advanced machine learning (ML) methods are needed to understand data characteristics and uncover meaningful relationships regarding the underlying complex psychological processes.
arXiv Detail & Related papers (2022-12-02T13:33:36Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.