Exploiting Pretrained Biochemical Language Models for Targeted Drug
Design
- URL: http://arxiv.org/abs/2209.00981v1
- Date: Fri, 2 Sep 2022 12:21:51 GMT
- Title: Exploiting Pretrained Biochemical Language Models for Targeted Drug
Design
- Authors: G\"ok\c{c}e Uludo\u{g}an, Elif Ozkirimli, Kutlu O. Ulgen, Nilg\"un
Karal{\i}, Arzucan \"Ozg\"ur
- Abstract summary: We propose exploiting pretrained biochemical language models to initialize targeted molecule generation models.
We compare two decoding strategies to generate compounds: beam search and sampling.
- Score: 0.1889930012459365
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Motivation: The development of novel compounds targeting proteins of interest
is one of the most important tasks in the pharmaceutical industry. Deep
generative models have been applied to targeted molecular design and have shown
promising results. Recently, target-specific molecule generation has been
viewed as a translation between the protein language and the chemical language.
However, such a model is limited by the availability of interacting
protein-ligand pairs. On the other hand, large amounts of unlabeled protein
sequences and chemical compounds are available and have been used to train
language models that learn useful representations. In this study, we propose
exploiting pretrained biochemical language models to initialize (i.e. warm
start) targeted molecule generation models. We investigate two warm start
strategies: (i) a one-stage strategy where the initialized model is trained on
targeted molecule generation (ii) a two-stage strategy containing a
pre-finetuning on molecular generation followed by target specific training. We
also compare two decoding strategies to generate compounds: beam search and
sampling.
Results: The results show that the warm-started models perform better than a
baseline model trained from scratch. The two proposed warm-start strategies
achieve similar results to each other with respect to widely used metrics from
benchmarks. However, docking evaluation of the generated compounds for a number
of novel proteins suggests that the one-stage strategy generalizes better than
the two-stage strategy. Additionally, we observe that beam search outperforms
sampling in both docking evaluation and benchmark metrics for assessing
compound quality.
Availability and implementation: The source code is available at
https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials
are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145
Related papers
- Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Co-modeling the Sequential and Graphical Routes for Peptide
Representation Learning [67.66393016797181]
We propose a peptide co-modeling method, RepCon, to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models.
RepCon learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs.
Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework.
arXiv Detail & Related papers (2023-10-04T16:58:25Z) - Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning [2.01243755755303]
We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets.
This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
arXiv Detail & Related papers (2023-08-02T12:08:17Z) - Target Specific De Novo Design of Drug Candidate Molecules with Graph
Transformer-based Generative Adversarial Networks [0.0]
We propose DrugGEN, for the de novo design of drug candidate molecules that interact with selected target proteins.
DrugGEN is trained using a large dataset of compounds from ChEMBL and target-specific bioactive molecules.
arXiv Detail & Related papers (2023-02-15T18:59:27Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - A Transformer-based Generative Model for De Novo Molecular Design [4.6782243206450325]
We propose a Transformer-based deep model for de novo target-specific molecular design.
The proposed method is capable of generating both drug-like compounds and target-specific compounds.
arXiv Detail & Related papers (2022-10-17T05:03:35Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - A biologically-inspired evaluation of molecular generative machine
learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed.
We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z) - Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based
Reinforcement Learning Model [4.815696666006742]
Structure-based de novo method can overcome the data scarcity of active by incorporating drug-target interaction into deep generative architectures.
Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning model for drug discovery.
As a proof of concept, the RL model was utilized to design molecules for four targets.
arXiv Detail & Related papers (2022-08-14T10:41:52Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.