Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein
Complexes with SE(3)-Discrete Diffusion
- URL: http://arxiv.org/abs/2401.06151v1
- Date: Thu, 21 Dec 2023 05:53:33 GMT
- Title: Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein
Complexes with SE(3)-Discrete Diffusion
- Authors: Alex Morehead, Jeffrey Ruffolo, Aadyot Bhatnagar, Ali Madani
- Abstract summary: We introduce MMDiff, a generative model that jointly designs sequences and structures of nucleic acid and protein complexes, independently or in complex.
Such a model has important implications for emerging areas of macromolecular design including structure-based transcription factor design and design of noncoding RNA sequences.
- Score: 4.292173366949847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models of macromolecules carry abundant and impactful implications
for industrial and biomedical efforts in protein engineering. However, existing
methods are currently limited to modeling protein structures or sequences,
independently or jointly, without regard to the interactions that commonly
occur between proteins and other macromolecules. In this work, we introduce
MMDiff, a generative model that jointly designs sequences and structures of
nucleic acid and protein complexes, independently or in complex, using joint
SE(3)-discrete diffusion noise. Such a model has important implications for
emerging areas of macromolecular design including structure-based transcription
factor design and design of noncoding RNA sequences. We demonstrate the utility
of MMDiff through a rigorous new design benchmark for macromolecular complex
generation that we introduce in this work. Our results demonstrate that MMDiff
is able to successfully generate micro-RNA and single-stranded DNA molecules
while being modestly capable of joint modeling DNA and RNA molecules in
interaction with multi-chain protein complexes. Source code:
https://github.com/Profluent-Internships/MMDiff.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Binding-Adaptive Diffusion Models for Structure-Based Drug Design [33.9764269117599]
We propose a novel framework, namely Binding-Adaptive Diffusion Models (BindDM)
In BindDM, we adaptively extract subcomplex, the essential part of binding sites responsible for protein-ligand interactions.
BindDM can generate molecules with more realistic 3D structures and higher binding affinities towards the protein targets, with up to -5.92 Avg. Vina Score.
arXiv Detail & Related papers (2024-01-15T00:34:00Z) - Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration [63.23362798102195]
We propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration.
D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points.
In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties.
arXiv Detail & Related papers (2023-05-30T06:41:20Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding [51.970607704953096]
Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one.
In real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms.
In this work, a generative diffusion model for molecular 3D structures based on target proteins is established, at a full-atom level in a non-autoregressive way.
arXiv Detail & Related papers (2022-11-21T07:02:15Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based
Reinforcement Learning Model [4.815696666006742]
Structure-based de novo method can overcome the data scarcity of active by incorporating drug-target interaction into deep generative architectures.
Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning model for drug discovery.
As a proof of concept, the RL model was utilized to design molecules for four targets.
arXiv Detail & Related papers (2022-08-14T10:41:52Z) - Protein Structure and Sequence Generation with Equivariant Denoising
Diffusion Probabilistic Models [3.5450828190071646]
An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions.
We introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches.
arXiv Detail & Related papers (2022-05-26T16:10:09Z) - Generating 3D Molecules Conditional on Receptor Binding Sites with Deep
Generative Models [0.0]
We describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site.
We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities.
This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.
arXiv Detail & Related papers (2021-10-28T15:17:24Z) - Neural representation and generation for RNA secondary structures [14.583976833366384]
Our work is concerned with the generation and targeted design of RNA, a type of genetic macromolecule.
The design of large scale and complex biological structures spurs dedicated graph-based deep generative modeling techniques.
We propose a flexible framework to jointly embed and generate different RNA structural modalities.
arXiv Detail & Related papers (2021-02-01T15:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.