Related papers: Deciphering antibody affinity maturation with language models and weakly supervised learning

Deciphering antibody affinity maturation with language models and weakly supervised learning

URL: http://arxiv.org/abs/2112.07782v1
Date: Tue, 14 Dec 2021 23:05:01 GMT
Title: Deciphering antibody affinity maturation with language models and weakly supervised learning
Authors: Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam
Abstract summary: We introduce AntiBERTy, a language model trained on 558M natural antibody sequences. We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation. We show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process.
Score: 10.506336354512145
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. We introduce AntiBERTy, a language model trained on 558M natural antibody sequences. We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation. Importantly, we show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process. With further development, the methods presented here will provide new insights into antigen binding from repertoire sequences alone.

Related papers

Benchmark for Antibody Binding Affinity Maturation and Design [11.905797701155263]
AbBiBench is a benchmarking framework for antibody binding affinity maturation and design.<n>We first curate, standardize, and share 9 datasets containing 9 antigens.<n>We compare 14 protein models including masked language models, autoregressive language models, inverse folding models, diffusion-based generative models, and geometric graph models.
arXiv Detail & Related papers (2025-05-23T21:09:04Z)
dyAb: Flow Matching for Flexible Antibody Design with AlphaFold-driven Pre-binding Antigen [52.809470467635194]
Development of therapeutic antibodies heavily relies on accurate predictions of how antigens will interact with antibodies. Existing computational methods in antibody design often overlook crucial conformational changes that antigens undergo during the binding process. We introduce dyAb, a flexible framework that incorporates AlphaFold2-driven predictions to model pre-binding antigen structures.
arXiv Detail & Related papers (2025-03-01T03:53:18Z)
Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization [61.06622479173572]
We propose a novel Relation-Aware Design (RAAD) framework, which models antigen-antibody interactions for co-designing sequences and structures of antigen-specific CDRs. Furthermore, we propose a new evaluation metric to better measure antibody specificity and develop a contrasting specificity-enhancing constraint to optimize the specificity of antibodies.
arXiv Detail & Related papers (2024-12-14T03:00:44Z)
Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer [7.702856943171885]
HelixFold-Multimer builds on the framework of AlphaFold-Multimer. It provides insights into antibody development, enabling more precise identification of binding sites. These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.
arXiv Detail & Related papers (2024-12-13T03:36:23Z)
S$^2$ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning [8.059724314850799]
Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. This paper proposes Sequence-Structure multi-level pre-trained antibody Language Model (S$2$ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model.
arXiv Detail & Related papers (2024-11-20T14:24:26Z)
Opponent Shaping for Antibody Development [49.26728828005039]
Anti-viral therapies are typically designed to target only the current strains of a virus. therapy-induced selective pressures act on viruses to drive the emergence of mutated strains, against which initial therapies have reduced efficacy. We build on a computational model of binding between antibodies and viral antigens to implement a genetic simulation of viral evolutionary escape.
arXiv Detail & Related papers (2024-09-16T14:56:27Z)
Large scale paired antibody language models [40.401345152825314]
We present IgBert and IgT5, the best performing antibody-specific language models developed to date. These models are trained comprehensively using the more than two billion Observed Space dataset. This advancement marks a significant leap forward in leveraging machine learning, large data sets and high-performance computing for enhancing antibody design for therapeutic development.
arXiv Detail & Related papers (2024-03-26T17:21:54Z)
Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization [51.28231365213679]
We tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences. We propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens.
arXiv Detail & Related papers (2024-03-25T09:41:49Z)
xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence. The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z)
Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design [134.65287929316673]
Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences. The computational methods heavily rely on high-quality antibody structure data, which is quite limited. Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
arXiv Detail & Related papers (2022-10-26T15:31:36Z)
Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency. Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance. In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z)
Antibody Representation Learning for Drug Discovery [7.291511531280898]
We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset. We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model. Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
arXiv Detail & Related papers (2022-10-05T13:48:41Z)
Neural message passing for joint paratope-epitope prediction [0.0]
Antibodies are proteins in the immune system which bind to antigens to detect and neutralise them. Prediction of binding sites in an antibody-antigen interaction are known as the paratope and, respectively, and are key to vaccine and synthetic antibody development.
arXiv Detail & Related papers (2021-05-31T16:37:55Z)
Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules. We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations. The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.