PROflow: An iterative refinement model for PROTAC-induced structure prediction
- URL: http://arxiv.org/abs/2405.06654v1
- Date: Wed, 10 Apr 2024 05:29:35 GMT
- Title: PROflow: An iterative refinement model for PROTAC-induced structure prediction
- Authors: Bo Qiang, Wenxian Shi, Yuxuan Song, Menghua Wu,
- Abstract summary: Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally undrug'' proteins by binding simultaneously to their targets and degradation-associated proteins.
A key challenge in their rational design is understanding their structural basis of activity.
Existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task.
We develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes.
This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained
- Score: 4.113597666007784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.
Related papers
- Language model driven: a PROTAC generation pipeline with dual constraints of structure and property [13.438107015508246]
The LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a.
The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.
arXiv Detail & Related papers (2024-12-12T10:15:12Z) - Pan-protein Design Learning Enables Task-adaptive Generalization for Low-resource Enzyme Design [44.258193520999484]
We present CrossDesign, a domain-adaptive framework that leverages pretrained protein language models (PPLMs)
By aligning protein structures with sequences, CrossDesign transfers pretrained knowledge to structure models, overcoming the limitations of limited structural data.
Experimental results highlight CrossDesign's superior performance and robustness, especially with out-of-domain enzymes.
arXiv Detail & Related papers (2024-11-26T17:51:33Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design [1.534667887016089]
Targeted protein degradation (TPD) aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways.
Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies.
Traditional methodologies for designing such complex molecules have limitations.
arXiv Detail & Related papers (2024-06-24T14:42:27Z) - Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data.
The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database.
PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - De novo PROTAC design using graph-based deep generative models [2.566673015346446]
We show that a graph-based generative model can be used to propose PROTAC-like structures from empty graphs.
We steer the generative model towards compounds with higher likelihoods of predicted degradation activity.
After fine-tuning, predicted activity against a challenging POI increases from 50% to >80% with near-perfect chemical validity.
arXiv Detail & Related papers (2022-11-04T15:34:45Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.