Variational embedding of protein folding simulations using gaussian
mixture variational autoencoders
- URL: http://arxiv.org/abs/2108.12493v1
- Date: Fri, 27 Aug 2021 20:31:08 GMT
- Title: Variational embedding of protein folding simulations using gaussian
mixture variational autoencoders
- Authors: Mahdi Ghorbani, Samarjeet Prasad, Jeffery B. Klauda, Bernard R. Brooks
- Abstract summary: We devise a machine learning method that can simultaneously perform dimensionality reduction and clustering of biomolecular conformations.
We show that GMVAE can learn a reduced representation of the free energy landscape of protein folding.
We also show that GMVAE embedding resembles the folding funnel with folded states down the funnel and unfolded states outer in the funnel path.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conformational sampling of biomolecules using molecular dynamics simulations
often produces large amount of high dimensional data that makes it difficult to
interpret using conventional analysis techniques. Dimensionality reduction
methods are thus required to extract useful and relevant information. Here we
devise a machine learning method, Gaussian mixture variational autoencoder
(GMVAE) that can simultaneously perform dimensionality reduction and clustering
of biomolecular conformations in an unsupervised way. We show that GMVAE can
learn a reduced representation of the free energy landscape of protein folding
with highly separated clusters that correspond to the metastable states during
folding. Since GMVAE uses a mixture of Gaussians as the prior, it can directly
acknowledge the multi-basin nature of protein folding free-energy landscape. To
make the model end-to-end differentialble, we use a Gumbel-softmax
distribution. We test the model on three long-timescale protein folding
trajectories and show that GMVAE embedding resembles the folding funnel with
folded states down the funnel and unfolded states outer in the funnel path.
Additionally, we show that the latent space of GMVAE can be used for kinetic
analysis and Markov state models built on this embedding produce folding and
unfolding timescales that are in close agreement with other rigorous dynamical
embeddings such as time independent component analysis (TICA).
Related papers
- Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Navigating protein landscapes with a machine-learned transferable
coarse-grained model [29.252004942896875]
coarse-grained (CG) model with similar prediction performance has been a long-standing challenge.
We develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences.
We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins.
arXiv Detail & Related papers (2023-10-27T17:10:23Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Implicit Transfer Operator Learning: Multiple Time-Resolution Surrogates
for Molecular Dynamics [8.35780131268962]
We present Implict Transfer Operator (ITO) Learning, a framework to learn surrogates of the simulation process with multiple time-resolutions.
We also present a coarse-grained CG-SE3-ITO model which can quantitatively model all-atom molecular dynamics.
arXiv Detail & Related papers (2023-05-29T12:19:41Z) - Latent Space Diffusion Models of Cryo-EM Structures [6.968705314671148]
We train a diffusion model as an expressive, learnable prior in the cryoDRGN framework.
By learning an accurate model of the data distribution, our method unlocks tools in generative modeling, sampling, and distribution analysis.
arXiv Detail & Related papers (2022-11-25T15:17:10Z) - GANs and Closures: Micro-Macro Consistency in Multiscale Modeling [0.0]
We present an approach that couples physics-based simulations and biasing methods for sampling conditional distributions with Machine Learning-based conditional generative adversarial networks.
We show that this framework can improve multiscale SDE dynamical systems sampling, and even shows promise for systems of increasing complexity.
arXiv Detail & Related papers (2022-08-23T03:45:39Z) - Automated analysis of continuum fields from atomistic simulations using
statistical machine learning [0.0]
We develop a methodology using statistical data mining and machine learning algorithms to automate the analysis of continuum field variables in atomistic simulations.
We focus on three important field variables: total strain, elastic strain and microrotation.
The peaks in the distribution of total strain are identified with a Gaussian mixture model and methods to circumvent overfitting problems are presented.
arXiv Detail & Related papers (2022-06-16T10:05:43Z) - Equivariant Diffusion for Molecule Generation in 3D [74.289191525633]
This work introduces a diffusion model for molecule computation generation in 3D that is equivariant to Euclidean transformations.
Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
arXiv Detail & Related papers (2022-03-31T12:52:25Z) - GeoDiff: a Geometric Diffusion Model for Molecular Conformation
Generation [102.85440102147267]
We propose a novel generative model named GeoDiff for molecular conformation prediction.
We show that GeoDiff is superior or comparable to existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-06T09:47:01Z) - A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs)
We show that energy-based SOM models can be interpreted as performing gradient descent.
This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.