Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning
- URL: http://arxiv.org/abs/2309.16685v1
- Date: Wed, 2 Aug 2023 12:08:17 GMT
- Title: Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning
- Authors: Nhat Khang Ngo and Truong Son Hy
- Abstract summary: We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets.
This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
- Score: 2.01243755755303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Without knowledge of specific pockets, generating ligands based on the global
structure of a protein target plays a crucial role in drug discovery as it
helps reduce the search space for potential drug-like candidates in the
pipeline. However, contemporary methods require optimizing tailored networks
for each protein, which is arduous and costly. To address this issue, we
introduce TargetVAE, a target-aware variational auto-encoder that generates
ligands with high binding affinities to arbitrary protein targets, guided by a
novel multimodal deep neural network built based on graph Transformers as the
prior for the generative model. This is the first effort to unify different
representations of proteins (e.g., sequence of amino-acids, 3D structure) into
a single model that we name as Protein Multimodal Network (PMN). Our multimodal
architecture learns from the entire protein structures and is able to capture
their sequential, topological and geometrical information. We showcase the
superiority of our approach by conducting extensive experiments and
evaluations, including the assessment of generative model quality, ligand
generation for unseen targets, docking score computation, and binding affinity
prediction. Empirical results demonstrate the promising performance of our
proposed approach. Our software package is publicly available at
https://github.com/HySonLab/Ligand_Generation
Related papers
- OneProt: Towards Multi-Modal Protein Foundation Models [5.440531199006399]
We introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data.
It surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction.
This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.
arXiv Detail & Related papers (2024-11-07T16:54:54Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language [0.24434823694833652]
MAMMAL is a versatile multi-task multi-align foundation model that learns from large-scale biological datasets.
We introduce a prompt syntax that supports a wide range of classification, regression, and generation tasks.
We evaluate the model on 11 diverse downstream tasks spanning different steps within a typical drug discovery pipeline.
arXiv Detail & Related papers (2024-10-28T20:45:52Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers [18.498779242323582]
We propose a novel approach, Prot2Text, which predicts a protein's function in a free text style.
By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types.
arXiv Detail & Related papers (2023-07-25T09:35:43Z) - HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for
Highly Accurate Protein-Ligand Binding Affinity Prediction [0.0]
We present a novel deep learning architecture consisting of a 3-dimensional convolutional neural network and two graph convolutional networks.
HAC-Net obtains state-of-the-art results on the PDBbind v.2016 core set.
We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction.
arXiv Detail & Related papers (2022-12-23T16:14:53Z) - HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein
Language Model as an Alternative [61.984700682903096]
HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2.
Our proposed method pre-trains a large-scale protein language model with thousands of millions of primary sequences.
We obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence.
arXiv Detail & Related papers (2022-07-28T07:30:33Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Sequence-guided protein structure determination using graph
convolutional and recurrent networks [0.0]
Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins.
Existing protocols for this type of task often rely on significant human intervention and can take hours to many days to produce an output.
Here, we present a fully automated, template-free model building approach that is based entirely on neural networks.
arXiv Detail & Related papers (2020-07-14T06:24:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.