Deciphering antibody affinity maturation with language models and weakly
supervised learning
- URL: http://arxiv.org/abs/2112.07782v1
- Date: Tue, 14 Dec 2021 23:05:01 GMT
- Title: Deciphering antibody affinity maturation with language models and weakly
supervised learning
- Authors: Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam
- Abstract summary: We introduce AntiBERTy, a language model trained on 558M natural antibody sequences.
We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation.
We show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process.
- Score: 10.506336354512145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In response to pathogens, the adaptive immune system generates specific
antibodies that bind and neutralize foreign antigens. Understanding the
composition of an individual's immune repertoire can provide insights into this
process and reveal potential therapeutic antibodies. In this work, we explore
the application of antibody-specific language models to aid understanding of
immune repertoires. We introduce AntiBERTy, a language model trained on 558M
natural antibody sequences. We find that within repertoires, our model clusters
antibodies into trajectories resembling affinity maturation. Importantly, we
show that models trained to predict highly redundant sequences under a multiple
instance learning framework identify key binding residues in the process. With
further development, the methods presented here will provide new insights into
antigen binding from repertoire sequences alone.
Related papers
- Opponent Shaping for Antibody Development [49.26728828005039]
Anti-viral therapies are typically designed to target only the current strains of a virus.
therapy-induced selective pressures act on viruses to drive the emergence of mutated strains, against which initial therapies have reduced efficacy.
We build on a computational model of binding between antibodies and viral antigens to implement a genetic simulation of viral evolutionary escape.
arXiv Detail & Related papers (2024-09-16T14:56:27Z) - Large scale paired antibody language models [40.401345152825314]
We present IgBert and IgT5, the best performing antibody-specific language models developed to date.
These models are trained comprehensively using the more than two billion Observed Space dataset.
This advancement marks a significant leap forward in leveraging machine learning, large data sets and high-performance computing for enhancing antibody design for therapeutic development.
arXiv Detail & Related papers (2024-03-26T17:21:54Z) - Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization [51.28231365213679]
We tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences.
We propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens.
arXiv Detail & Related papers (2024-03-25T09:41:49Z) - AbODE: Ab Initio Antibody Design using Conjoined ODEs [8.523238510909955]
We develop a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions.
We unravel fundamental connections between AbODE and temporal networks as well as graph-matching networks.
arXiv Detail & Related papers (2023-05-31T14:40:47Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design [134.65287929316673]
Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences.
The computational methods heavily rely on high-quality antibody structure data, which is quite limited.
Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
arXiv Detail & Related papers (2022-10-26T15:31:36Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Antibody Representation Learning for Drug Discovery [7.291511531280898]
We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset.
We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model.
Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
arXiv Detail & Related papers (2022-10-05T13:48:41Z) - Neural message passing for joint paratope-epitope prediction [0.0]
Antibodies are proteins in the immune system which bind to antigens to detect and neutralise them.
Prediction of binding sites in an antibody-antigen interaction are known as the paratope and, respectively, and are key to vaccine and synthetic antibody development.
arXiv Detail & Related papers (2021-05-31T16:37:55Z) - Accelerating Antimicrobial Discovery with Controllable Deep Generative
Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules.
We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations.
The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.