Fast conformational clustering of extensive molecular dynamics
simulation data
- URL: http://arxiv.org/abs/2301.04492v1
- Date: Wed, 11 Jan 2023 14:36:43 GMT
- Title: Fast conformational clustering of extensive molecular dynamics
simulation data
- Authors: Simon Hunkler, Kay Diederichs, Oleksandra Kukharenko, Christine Peter
- Abstract summary: We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long trajectories.
We combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (HDBSCAN)
With the help of four test systems we illustrate the capability and performance of this clustering workflow.
- Score: 19.444636864515726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an unsupervised data processing workflow that is specifically
designed to obtain a fast conformational clustering of long molecular dynamics
simulation trajectories. In this approach we combine two dimensionality
reduction algorithms (cc\_analysis and encodermap) with a density-based spatial
clustering algorithm (HDBSCAN). The proposed scheme benefits from the strengths
of the three algorithms while avoiding most of the drawbacks of the individual
methods. Here the cc\_analysis algorithm is for the first time applied to
molecular simulation data. Encodermap complements cc\_analysis by providing an
efficient way to process and assign large amounts of data to clusters. The main
goal of the procedure is to maximize the number of assigned frames of a given
trajectory, while keeping a clear conformational identity of the clusters that
are found. In practice we achieve this by using an iterative clustering
approach and a tunable root-mean-square-deviation-based criterion in the final
cluster assignment. This allows to find clusters of different densities as well
as different degrees of structural identity. With the help of four test systems
we illustrate the capability and performance of this clustering workflow:
wild-type and thermostable mutant of the Trp-cage protein (TC5b and TC10b),
NTL9 and Protein B. Each of these systems poses individual challenges to the
scheme, which in total give a nice overview of the advantages, as well as
potential difficulties that can arise when using the proposed method.
Related papers
- Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Anchor-free Clustering based on Anchor Graph Factorization [17.218481911995365]
We introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF)
AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples.
We evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.
arXiv Detail & Related papers (2024-02-24T02:16:42Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - FLASC: A Flare-Sensitive Clustering Algorithm [0.0]
We present FLASC, an algorithm that detects branches within clusters to identify subpopulations.
Two variants of the algorithm are presented, which trade computational cost for noise robustness.
We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs.
arXiv Detail & Related papers (2023-11-27T14:55:16Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.