Related papers: A Self-Supervised Learning-based Approach to Clustering Multivariate Time-Series Data with Missing Values (SLAC-Time): An Application to TBI Phenotyping

A Self-Supervised Learning-based Approach to Clustering Multivariate Time-Series Data with Missing Values (SLAC-Time): An Application to TBI Phenotyping

URL: http://arxiv.org/abs/2302.13457v2
Date: Sat, 27 May 2023 20:21:49 GMT
Title: A Self-Supervised Learning-based Approach to Clustering Multivariate Time-Series Data with Missing Values (SLAC-Time): An Application to TBI Phenotyping
Authors: Hamid Ghaderi, Brandon Foreman, Amin Nayebi, Sindhu Tipirneni, Chandan K. Reddy, Vignesh Subbian
Abstract summary: We present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time) SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data. Experiments show that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index.
Score: 8.487912181381404
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Self-supervised learning approaches provide a promising direction for clustering multivariate time-series data. However, real-world time-series data often include missing values, and the existing approaches require imputing missing values before clustering, which may cause extensive computations and noise and result in invalid interpretations. To address these challenges, we present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time). SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data and learning more robust time-series representations. This method jointly learns the neural network parameters and the cluster assignments of the learned representations. It iteratively clusters the learned representations with the K-means method and then utilizes the subsequent cluster assignments as pseudo-labels to update the model parameters. To evaluate our proposed approach, we applied it to clustering and phenotyping Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Our experiments demonstrate that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index. We identified three TBI phenotypes that are distinct from one another in terms of clinically significant variables as well as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE) score, Intensive Care Unit (ICU) length of stay, and mortality rate. The experiments show that the TBI phenotypes identified by SLAC-Time can be potentially used for developing targeted clinical trials and therapeutic strategies.

Related papers

Concrete Dense Network for Long-Sequence Time Series Clustering [4.307648859471193]
Time series clustering is fundamental in data analysis for discovering temporal patterns. Deep temporal clustering methods have been trying to integrate the canonical k-means into end-to-end training of neural networks. LoSTer is a novel dense autoencoder architecture for the long-sequence time series clustering problem.
arXiv Detail & Related papers (2024-05-08T12:31:35Z)
Identifying TBI Physiological States by Clustering Multivariate Clinical Time-Series Data [8.487912181381404]
SLAC-Time is an innovative self-supervision-based approach that maintains data integrity by avoiding imputation or aggregation. By using SLAC-Time to cluster data in a large research dataset, we identified three distinct TBI physiological states.
arXiv Detail & Related papers (2023-03-23T04:16:00Z)
Time Associated Meta Learning for Clinical Prediction [78.99422473394029]
We propose a novel time associated meta learning (TAML) method to make effective predictions at multiple future time points. To address the sparsity problem after task splitting, TAML employs a temporal information sharing strategy to augment the number of positive samples. We demonstrate the effectiveness of TAML on multiple clinical datasets, where it consistently outperforms a range of strong baselines.
arXiv Detail & Related papers (2023-03-05T03:54:54Z)
T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data. We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z)
Deep Temporal Contrastive Clustering [21.660509622172274]
This paper presents a deep temporal contrastive clustering approach. It incorporates the contrastive learning paradigm into the deep time series clustering research. Experiments on a variety of time series datasets demonstrate the superiority of our approach over the state-of-the-art.
arXiv Detail & Related papers (2022-12-29T16:43:34Z)
Clustering individuals based on multivariate EMA time-series data [2.0824228840987447]
Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements. Advanced machine learning (ML) methods are needed to understand data characteristics and uncover meaningful relationships regarding the underlying complex psychological processes.
arXiv Detail & Related papers (2022-12-02T13:33:36Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection. Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch. Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z)
Topology-based Clusterwise Regression for User Segmentation and Demand Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level. This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z)
Temporal Phenotyping using Deep Predictive Clustering of Disease Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest. Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.