Language model driven: a PROTAC generation pipeline with dual constraints of structure and property
- URL: http://arxiv.org/abs/2412.09661v1
- Date: Thu, 12 Dec 2024 10:15:12 GMT
- Title: Language model driven: a PROTAC generation pipeline with dual constraints of structure and property
- Authors: Jinsong Shao, Qineng Gong, Zeyu Yin, Yu Chen, Yajie Hao, Lei Zhang, Linlin Jiang, Min Yao, Jinlong Li, Fubo Wang, Li Wang,
- Abstract summary: The LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a.
The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.
- Score: 13.438107015508246
- License:
- Abstract: The imperfect modeling of ternary complexes has limited the application of computer-aided drug discovery tools in PROTAC research and development. In this study, an AI-assisted approach for PROTAC molecule design pipeline named LM-PROTAC was developed, which stands for language model driven Proteolysis Targeting Chimera, by embedding a transformer-based generative model with dual constraints on structure and properties, referred to as the DCT. This study utilized the fragmentation representation of molecules and developed a language model driven pipeline. Firstly, a language model driven affinity model for protein compounds to screen molecular fragments with high affinity for the target protein. Secondly, structural and physicochemical properties of these fragments were constrained during the generation process to meet specific scenario requirements. Finally, a two-round screening of the preliminary generated molecules using a multidimensional property prediction model to generate a batch of PROTAC molecules capable of degrading disease-relevant target proteins for validation in vitro experiments, thus achieving a complete solution for AI-assisted PROTAC drug generation. Taking the tumor key target Wnt3a as an example, the LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a. The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.
Related papers
- A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design [1.534667887016089]
Targeted protein degradation (TPD) aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways.
Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies.
Traditional methodologies for designing such complex molecules have limitations.
arXiv Detail & Related papers (2024-06-24T14:42:27Z) - PROflow: An iterative refinement model for PROTAC-induced structure prediction [4.113597666007784]
Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally undrug'' proteins by binding simultaneously to their targets and degradation-associated proteins.
A key challenge in their rational design is understanding their structural basis of activity.
Existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task.
We develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes.
This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained
arXiv Detail & Related papers (2024-04-10T05:29:35Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [74.64101864289572]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot
Antibody Designer [58.97153056120193]
The specificity of an antibody is determined by its complementarity-determining regions (CDRs)
Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling.
We propose a textitsimple yet effective model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner.
arXiv Detail & Related papers (2023-04-21T13:24:26Z) - De novo PROTAC design using graph-based deep generative models [2.566673015346446]
We show that a graph-based generative model can be used to propose PROTAC-like structures from empty graphs.
We steer the generative model towards compounds with higher likelihoods of predicted degradation activity.
After fine-tuning, predicted activity against a challenging POI increases from 50% to >80% with near-perfect chemical validity.
arXiv Detail & Related papers (2022-11-04T15:34:45Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Benchmarking Deep Graph Generative Models for Optimizing New Drug
Molecules for COVID-19 [11.853524110656991]
Design of new drug compounds with target properties is a key area of research in generative modeling.
We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates.
arXiv Detail & Related papers (2021-02-09T17:49:26Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.