Related papers: Machine learning approaches for interpretable antibody property prediction using structural data

Machine learning approaches for interpretable antibody property prediction using structural data

URL: http://arxiv.org/abs/2510.23975v1
Date: Tue, 28 Oct 2025 01:13:09 GMT
Title: Machine learning approaches for interpretable antibody property prediction using structural data
Authors: Kevin Michalewicz, Mauricio Barahona, Barbara Bravi,
Abstract summary: Understanding the relationship between antibody sequence, structure and function is essential for the design of antibody-based therapeutics and research tools.<n>Machine learning models mostly based on the application of large language models to sequence information have been developed to predict antibody properties.<n>This chapter describes two ML frameworks that integrate structural data (via graph representations) with neural networks to predict properties of antibodies.
Score: 1.406995367117218
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the relationship between antibody sequence, structure and function is essential for the design of antibody-based therapeutics and research tools. Recently, machine learning (ML) models mostly based on the application of large language models to sequence information have been developed to predict antibody properties. Yet there are open directions to incorporate structural information, not only to enhance prediction but also to offer insights into the underlying molecular mechanisms. This chapter provides an overview of these approaches and describes two ML frameworks that integrate structural data (via graph representations) with neural networks to predict properties of antibodies: ANTIPASTI predicts binding affinity (a global property) whereas INFUSSE predicts residue flexibility (a local property). We survey the principles underpinning these models; the ways in which they encode structural knowledge; and the strategies that can be used to extract biologically relevant statistical signals that can help discover and disentangle molecular determinants of the properties of interest.

Related papers

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics [22.314786276794717]
The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science.<n>Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging.<n>Our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction.
arXiv Detail & Related papers (2026-02-26T10:05:01Z)
Exploring Protein Language Model Architecture-Induced Biases for Antibody Comprehension [24.38887522188594]
We investigate how architectural choices in protein language models (PLMs) influence their ability to comprehend antibody sequence characteristics and functions.<n>We evaluate three state-of-the-art PLMs-AntiBERTa, BioBERT, and ESM2--against a general-purpose language model (GPT-2) baseline on antibody target specificity prediction tasks.<n>Our results demonstrate that while all PLMs achieve high classification accuracy, they exhibit distinct biases in capturing biological features such as V gene usage, somatic hypermutation patterns, and isotype information.
arXiv Detail & Related papers (2025-12-10T18:22:51Z)
Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis [0.9217021281095907]
SEAL (Substructure Explanation via Attribution Learning) is a new interpretable graph neural network that attributes model predictions to meaningful molecular subgraphs.<n> SEAL decomposes input graphs into chemically relevant fragments and estimates their causal influence on the output.
arXiv Detail & Related papers (2025-08-20T19:15:53Z)
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins [4.747546562792329]
We introduce Ibex, a pan-immunoglobulin structure prediction model.<n>It achieves state-of-the-art accuracy in modeling the variable domains of antibodies, nanobodies, and T-cell receptors.
arXiv Detail & Related papers (2025-07-11T22:09:03Z)
PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [88.98041407783502]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z)
Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization [11.87029706744257]
We propose a framework by implementing graph neural networks (GNNs) to predict compound-protein affinity.<n>We train GNNs with structure-aware loss functions using group lasso and sparse group lasso coloring regularizations.<n>Our approach improved property prediction by integrating common and uncommon node information with sparse group lasso.
arXiv Detail & Related papers (2025-07-04T06:12:18Z)
KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z)
Linear to Neural Networks Regression: QSPR of Drugs via Degree-Distance Indices [0.0]
The study provides an innovative perspective on integrating topological indices with machine learning to enhance predictive accuracy.<n>This predictive may also explain that establishing a reliable relationship between topological indices and physical properties enables chemists to gain preliminary insights into molecular behavior.
arXiv Detail & Related papers (2025-03-18T20:03:59Z)
Graph neural networks for the prediction of molecular structure-property relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph. GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors. We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z)
Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution. We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture. It shows superior interpretability to the state-of-the-art. It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.