Deciphering antibody affinity maturation with language models and weakly
supervised learning
- URL: http://arxiv.org/abs/2112.07782v1
- Date: Tue, 14 Dec 2021 23:05:01 GMT
- Title: Deciphering antibody affinity maturation with language models and weakly
supervised learning
- Authors: Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam
- Abstract summary: We introduce AntiBERTy, a language model trained on 558M natural antibody sequences.
We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation.
We show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process.
- Score: 10.506336354512145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In response to pathogens, the adaptive immune system generates specific
antibodies that bind and neutralize foreign antigens. Understanding the
composition of an individual's immune repertoire can provide insights into this
process and reveal potential therapeutic antibodies. In this work, we explore
the application of antibody-specific language models to aid understanding of
immune repertoires. We introduce AntiBERTy, a language model trained on 558M
natural antibody sequences. We find that within repertoires, our model clusters
antibodies into trajectories resembling affinity maturation. Importantly, we
show that models trained to predict highly redundant sequences under a multiple
instance learning framework identify key binding residues in the process. With
further development, the methods presented here will provide new insights into
antigen binding from repertoire sequences alone.
Related papers
- Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization [61.06622479173572]
We propose a novel Relation-Aware Design (RAAD) framework, which models antigen-antibody interactions for co-designing sequences and structures of antigen-specific CDRs.
Furthermore, we propose a new evaluation metric to better measure antibody specificity and develop a contrasting specificity-enhancing constraint to optimize the specificity of antibodies.
arXiv Detail & Related papers (2024-12-14T03:00:44Z) - Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer [7.702856943171885]
HelixFold-Multimer builds on the framework of AlphaFold-Multimer.
It provides insights into antibody development, enabling more precise identification of binding sites.
These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.
arXiv Detail & Related papers (2024-12-13T03:36:23Z) - S$^2$ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning [8.059724314850799]
Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19.
Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions.
This paper proposes Sequence-Structure multi-level pre-trained antibody Language Model (S$2$ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model.
arXiv Detail & Related papers (2024-11-20T14:24:26Z) - Large scale paired antibody language models [40.401345152825314]
We present IgBert and IgT5, the best performing antibody-specific language models developed to date.
These models are trained comprehensively using the more than two billion Observed Space dataset.
This advancement marks a significant leap forward in leveraging machine learning, large data sets and high-performance computing for enhancing antibody design for therapeutic development.
arXiv Detail & Related papers (2024-03-26T17:21:54Z) - Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization [51.28231365213679]
We tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences.
We propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens.
arXiv Detail & Related papers (2024-03-25T09:41:49Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design [134.65287929316673]
Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences.
The computational methods heavily rely on high-quality antibody structure data, which is quite limited.
Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
arXiv Detail & Related papers (2022-10-26T15:31:36Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Antibody Representation Learning for Drug Discovery [7.291511531280898]
We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset.
We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model.
Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
arXiv Detail & Related papers (2022-10-05T13:48:41Z) - Accelerating Antimicrobial Discovery with Controllable Deep Generative
Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules.
We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations.
The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.