Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
- URL: http://arxiv.org/abs/2407.00111v1
- Date: Thu, 27 Jun 2024 13:04:58 GMT
- Title: Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
- Authors: Ben Fauber,
- Abstract summary: We describe the accurate prediction of ligand-protein interaction (LPI) affinities with instruction fine-tuned pretrained generative small language models (SLMs)
Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of LPI affinities.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.
Related papers
- Modeling Dabrafenib Response Using Multi-Omics Modality Fusion and Protein Network Embeddings Based on Graph Convolutional Networks [0.0]
Cancer cell response to targeted therapy arises from complex molecular interactions, making single omics insufficient for accurate prediction.<n>This study develops a model to predict Dabrafenib sensitivity by integrating multiple omics layers (genomics, transcriptomics, epigenomics, metabolomics) with protein network embeddings generated using Graph Convolutional Networks (GCN)<n>Results show that attention guided multi omics fusion combined with GCN improves drug response prediction and reveals complementary molecular determinants of Dabrafenib sensitivity.
arXiv Detail & Related papers (2025-12-13T02:00:56Z) - Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design [61.2846583160056]
Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains.<n>This is in part because high-quality annotated data are far more difficult to obtain for proteins than for natural language.<n>We present a simple and general recipe for fast SFT of PLMs, designed to improve the fidelity, reliability, and novelty of generated protein sequences.
arXiv Detail & Related papers (2025-12-10T05:34:47Z) - Learning to Align Molecules and Proteins: A Geometry-Aware Approach to Binding Affinity [0.0]
FIRM-DTI is a lightweight framework that conditions molecular embeddings on protein embeddings through a feature-wise linear modulation layer.<n>An RBF regression head operating on embedding distances yields smooth, interpretable affinity predictions.<n>Our results underscore the value of conditioning and metric learning for robust drug-target affinity prediction.
arXiv Detail & Related papers (2025-09-25T02:55:24Z) - Deep Learning Model for Amyloidogenicity Prediction using a Pre-trained Protein LLM [0.0]
Recent approaches to predicting amyloidogenicity within proteins are highly based on evolutionary motifs and the individual properties of amino acids.<n>Our study evaluated the contextual features of protein sequences obtained from a pretrained protein large language model.<n>Our method achieved an accuracy of 84.5% on 10-fold cross-validation and an accuracy of 83% in the test dataset.
arXiv Detail & Related papers (2025-08-18T02:21:48Z) - KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches [48.66541987908136]
Much work has been devoted to predicting binding affinity over the past decades.<n>We note growing use of both traditional machine learning and deep learning models for predicting binding affinity.<n>With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction.
arXiv Detail & Related papers (2024-09-30T03:40:49Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - Improved K-mer Based Prediction of Protein-Protein Interactions With
Chaos Game Representation, Deep Learning and Reduced Representation Bias [0.0]
We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning.
We develop a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
arXiv Detail & Related papers (2023-10-23T10:02:23Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Associative Learning Mechanism for Drug-Target Interaction Prediction [6.107658437700639]
Drug-target affinity (DTA) represents the strength of drug-target interaction (DTI)
Traditional methods lack the interpretability of the DTA prediction process.
This paper proposes a DTA prediction method with interactive learning and an autoencoder mechanism.
arXiv Detail & Related papers (2022-05-24T14:25:28Z) - AI-Bind: Improving Binding Predictions for Novel Protein Targets and
Ligands [9.135203550164833]
We show that state-of-the-art models fail to generalize to novel structures.
We introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training.
We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins.
arXiv Detail & Related papers (2021-12-25T01:52:58Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Cross-Modality Protein Embedding for Compound-Protein Affinity and
Contact Prediction [15.955668586941472]
We consider proteins as multi-modal data including 1D amino-acid sequences and (sequence-predicted) 2D residue-pair contact maps.
We empirically evaluate the embeddings of the two single modalities in their accuracy and generalizability of CPAC prediction.
arXiv Detail & Related papers (2020-11-14T04:42:25Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.