Related papers: Exploiting Pretrained Biochemical Language Models for Targeted Drug Design

Exploiting Pretrained Biochemical Language Models for Targeted Drug Design

URL: http://arxiv.org/abs/2209.00981v1
Date: Fri, 2 Sep 2022 12:21:51 GMT
Title: Exploiting Pretrained Biochemical Language Models for Targeted Drug Design
Authors: G\"ok\c{c}e Uludo\u{g}an, Elif Ozkirimli, Kutlu O. Ulgen, Nilg\"un Karal{\i}, Arzucan \"Ozg\"ur
Abstract summary: We propose exploiting pretrained biochemical language models to initialize targeted molecule generation models. We compare two decoding strategies to generate compounds: beam search and sampling.
Score: 0.1889930012459365
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145

Related papers

Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties. It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z)
Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z)
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously. xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z)
Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning [2.01243755755303]
We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets. This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
arXiv Detail & Related papers (2023-08-02T12:08:17Z)
Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks [0.0]
We propose an end-to-end generative system, DrugGEN, for the de novo design of drug candidate molecules. The system is trained using a large dataset of drug-like compounds and target-specific bioactive molecules. Using the open-access DrugGEN, it is possible to easily train models for other druggable proteins.
arXiv Detail & Related papers (2023-02-15T18:59:27Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
A Transformer-based Generative Model for De Novo Molecular Design [4.6782243206450325]
We propose a Transformer-based deep model for de novo target-specific molecular design. The proposed method is capable of generating both drug-like compounds and target-specific compounds.
arXiv Detail & Related papers (2022-10-17T05:03:35Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
A biologically-inspired evaluation of molecular generative machine learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z)
Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based Reinforcement Learning Model [4.815696666006742]
Structure-based de novo method can overcome the data scarcity of active by incorporating drug-target interaction into deep generative architectures. Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning model for drug discovery. As a proof of concept, the RL model was utilized to design molecules for four targets.
arXiv Detail & Related papers (2022-08-14T10:41:52Z)
Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data. Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.