Related papers: Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL

Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL

URL: http://arxiv.org/abs/2405.02374v1
Date: Fri, 3 May 2024 10:33:19 GMT
Title: Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL
Authors: Arturo Fiorellini-Bernardis, Sebastien Boyer, Christoph Brunken, Bakary Diallo, Karim Beguir, Nicolas Lopez-Carranza, Oliver Bent,
Abstract summary: Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations. We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes. eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
Score: 1.840390797252648
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Protein-protein interactions (PPIs) play a crucial role in numerous biological processes. Developing methods that predict binding affinity changes under substitution mutations is fundamental for modelling and re-engineering biological systems. Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations. With this contribution, we propose eGRAL, a novel SE(3) equivariant graph neural network (eGNN) architecture designed for predicting binding affinity changes from multiple amino acid substitutions in protein complexes. eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models. To address the limited availability of large-scale affinity assays with structural information, we generate a simulated dataset comprising approximately 500,000 data points. Our model is pre-trained on this dataset, then fine-tuned and tested on experimental data.

Related papers

PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks [1.2289361708127877]
This paper presents a novel perspective on the problem by targeting continuous compact representations of protein motions inferred from sparse experimental observations. Our method PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) leverages transfer learning from pre-trained protein language models through an SE(3)-equivariant graph neural network. When trained and evaluated on the Protein Data Bank, PETIMOT shows superior performance in time and accuracy, capturing protein dynamics, particularly large/slow conformational changes, compared to state-of-the-art flow-matching approaches and traditional physics-based models.
arXiv Detail & Related papers (2025-03-19T09:25:32Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences. In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z)
Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling. We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z)
Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry. We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z)
Improved K-mer Based Prediction of Protein-Protein Interactions With Chaos Game Representation, Deep Learning and Reduced Representation Bias [0.0]
We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning. We develop a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
arXiv Detail & Related papers (2023-10-23T10:02:23Z)
Reprogramming Pretrained Language Models for Protein Sequence Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences. Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z)
SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering [6.216757583450049]
We develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship. Our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants.
arXiv Detail & Related papers (2022-12-29T01:49:52Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Learning Geometrically Disentangled Representations of Protein Folding Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction [7.059949221160259]
We develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting compound-protein binding affinities (CPAs) FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%.
arXiv Detail & Related papers (2022-03-30T00:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.