Sequence-Based Nanobody-Antigen Binding Prediction
- URL: http://arxiv.org/abs/2308.01920v1
- Date: Sat, 15 Jul 2023 02:00:19 GMT
- Title: Sequence-Based Nanobody-Antigen Binding Prediction
- Authors: Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib,
Khurram Bashir, Imdad Ullah Khan, Murray Patterson
- Abstract summary: A critical challenge in nanobodies production is the unavailability of nanobodies for a majority of antigens.
This study aims to develop a machine-learning method to predict Nanobody-Antigen binding solely based on the sequence data.
- Score: 1.7284653203366596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain
only antibodies naturally found in Camelids and Sharks. Their considerably
small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them
attractive targets for recombinant production. Furthermore, their unique
ability to bind selectively to specific antigens, such as toxins, chemicals,
bacteria, and viruses, makes them powerful tools in cell biology, structural
biology, medical diagnostics, and future therapeutic agents in treating cancer
and other serious illnesses. However, a critical challenge in nanobodies
production is the unavailability of nanobodies for a majority of antigens.
Although some computational methods have been proposed to screen potential
nanobodies for given target antigens, their practical application is highly
restricted due to their reliance on 3D structures. Moreover, predicting
nanobodyantigen interactions (binding) is a time-consuming and labor-intensive
task. This study aims to develop a machine-learning method to predict
Nanobody-Antigen binding solely based on the sequence data. We curated a
comprehensive dataset of Nanobody-Antigen binding and nonbinding data and
devised an embedding method based on gapped k-mers to predict binding based
only on sequences of nanobody and antigen. Our approach achieves up to 90%
accuracy in binding prediction and is significantly more efficient compared to
the widely-used computational docking technique.
Related papers
- A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions [22.339823160991934]
We propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins.<n>We present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning.
arXiv Detail & Related papers (2025-07-18T00:00:52Z) - Llama-Affinity: A Predictive Antibody Antigen Binding Model Integrating Antibody Sequences with Llama3 Backbone Architecture [2.474908349649168]
We present an advanced antibody-antigen binding affinity prediction model (Llamafinity)<n>The model achieved an accuracy of 0.9640, an F1-score of 0.9643, a precision of 0.9702, a recall of 0.9586, and an AUC-ROC of 0.9936.<n>This strategy unveiled higher computational efficiency, with a five-fold average cumulative training time of only 0.46 hours.
arXiv Detail & Related papers (2025-05-17T20:10:54Z) - NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks [6.485214172837228]
We introduce NbBench, the first comprehensive benchmark suite for nanobody representation learning.<n>NbBench encompasses structure annotation, binding prediction, and developability assessment.<n>Our analysis reveals that antibody language models excel in antigen-related tasks, while performance on regression tasks such as thermostability and affinity remains challenging.
arXiv Detail & Related papers (2025-05-04T08:18:10Z) - Leveraging Large Language Models to Predict Antibody Biological Activity Against Influenza A Hemagglutinin [0.15547733154162566]
We develop an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutininin (HA) antigens.
Our models achieved an AUROC $geq$ 0.91 for predicting the activity of existing antibodies against seen HAs and an AUROC of 0.9 for unseen HAs.
arXiv Detail & Related papers (2025-02-02T06:48:45Z) - Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer [7.702856943171885]
HelixFold-Multimer builds on the framework of AlphaFold-Multimer.
It provides insights into antibody development, enabling more precise identification of binding sites.
These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.
arXiv Detail & Related papers (2024-12-13T03:36:23Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca
for Predicting Antigen-Antibody Interactions [1.1381826108737396]
We have developed a large-scale dataset for predicting antigen-antibody interactions in the variable domain of heavy chain of heavy chain antibodies (VHHs)
AVIDa-hIL6 contains 573,891 antigen-VHH pairs with amino acid sequences.
We report experimental benchmark results on AVIDa-hIL6 by using machine learning models.
arXiv Detail & Related papers (2023-06-06T00:42:36Z) - Random Copolymer inverse design system orienting on Accurate discovering
of Antimicrobial peptide-mimetic copolymers [9.416757363901295]
We develop a universal random copolymer inverse design system via multi-model copolymer representation learning, knowledge distillation and reinforcement learning.
By pre-training a scaffold-decorator generative model via knowledge distillation, copolymer space are greatly contracted to the near space of existing data for exploration.
Our reinforcement learning algorithm can be adaptive for customized generation on specific scaffolds and requirements on property or structures.
arXiv Detail & Related papers (2022-11-30T14:29:50Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design [134.65287929316673]
Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences.
The computational methods heavily rely on high-quality antibody structure data, which is quite limited.
Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
arXiv Detail & Related papers (2022-10-26T15:31:36Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Antibody Representation Learning for Drug Discovery [7.291511531280898]
We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset.
We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model.
Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
arXiv Detail & Related papers (2022-10-05T13:48:41Z) - AntBO: Towards Real-World Automated Antibody Design with Combinatorial
Bayesian Optimisation [53.43922443725598]
We present AntBO: a Combinatorial optimisation algorithm enabling efficient in silico design of the CDRH3 region.
To benchmark AntBO, we use the Absolut! software suite as a black-box oracle because it can score the target specificity and affinity of designed antibodies in silico.
In under 200 protein designs, AntBO can suggest antibody sequences that outperform the best binding sequence drawn from 6.9 million experimentally obtained CDRH3s.
arXiv Detail & Related papers (2022-01-29T12:03:04Z) - Accelerating Antimicrobial Discovery with Controllable Deep Generative
Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules.
We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations.
The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.