CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning
- URL: http://arxiv.org/abs/2305.08057v2
- Date: Fri, 9 Aug 2024 17:16:32 GMT
- Title: CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning
- Authors: Colin A. Grambow, Hayley Weir, Christian N. Cunningham, Tommaso Biancalani, Kangway V. Chuang,
- Abstract summary: We introduce CREMP, a resource for the rapid development and evaluation of machine learning models for macrocyclic peptides.
CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST)
Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations.
- Score: 0.1747623282473278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the development of machine learning models that can improve peptide design and optimization for novel therapeutics.
Related papers
- Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints [65.77915791312634]
We propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation.<n>Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model.<n>Our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84%.
arXiv Detail & Related papers (2025-07-06T03:30:45Z) - Multiscale guidance of AlphaFold3 with heterogeneous cryo-EM data [33.562685684224995]
cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity.<n>Here, we combine cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models.<n>Our method, CryoBoltz, guides the sampling trajectory of a pretrained protein structure prediction model using both global and local structural constraints.
arXiv Detail & Related papers (2025-06-04T22:16:27Z) - Statistical learning of structure-property relationships for transport in porous media, using hybrid AI modeling [0.0]
The 3D microstructure of porous media significantly impacts the resulting macroscopic properties, including effective diffusivity or permeability.
quantitative structure-property relationships are crucial for further optimizing the performance of porous media.
The present paper uses 90,000 virtually generated 3D microstructures of porous media derived from literature.
The paper extends these findings by applying a hybrid AI framework to this data set.
arXiv Detail & Related papers (2025-03-27T14:46:40Z) - MIND: Microstructure INverse Design with Generative Hybrid Neural Representation [25.55691106041371]
inverse design of microstructures plays a pivotal role in optimizing metamaterials with specific, targeted physical properties.
We present a novel generative model that integrates latent diffusion with Holoplane, an advanced hybrid neural representation that simultaneously encodes both geometric and physical properties.
Our approach generalizes across multiple microstructure classes, enabling the generation of diverse, tileable microstructures with significantly improved property accuracy and enhanced control over geometric validity.
arXiv Detail & Related papers (2025-02-01T20:25:47Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion [0.5475672579692472]
RINGER is a diffusion-based transformer model that generates 3D conformational ensembles of macrocyclic peptides from their 2D representations.
We show how RINGER generates both high-quality and diverse geometries at a fraction of the computational cost.
arXiv Detail & Related papers (2023-05-30T16:39:18Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Atomic structure generation from reconstructing structural fingerprints [1.2128971613239876]
We present an end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model.
We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept.
arXiv Detail & Related papers (2022-07-27T00:42:59Z) - Linking Properties to Microstructure in Liquid Metal Embedded Elastomers
via Machine Learning [0.0]
Liquid metals (LM) are embedded in an elastomer matrix to obtain soft composites with unique thermal, dielectric, and mechanical properties.
By linking the structure to the properties of these materials, it is possible to perform material design rationally.
arXiv Detail & Related papers (2022-07-24T06:02:26Z) - Three-dimensional microstructure generation using generative adversarial
neural networks in the context of continuum micromechanics [77.34726150561087]
This work proposes a generative adversarial network tailored towards three-dimensional microstructure generation.
The lightweight algorithm is able to learn the underlying properties of the material from a single microCT-scan without the need of explicit descriptors.
arXiv Detail & Related papers (2022-05-31T13:26:51Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.