Antibody Representation Learning for Drug Discovery
- URL: http://arxiv.org/abs/2210.02881v1
- Date: Wed, 5 Oct 2022 13:48:41 GMT
- Title: Antibody Representation Learning for Drug Discovery
- Authors: Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Tristan Bepler,
Rajmonda Sulo Caceres
- Abstract summary: We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset.
We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model.
Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
- Score: 7.291511531280898
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Therapeutic antibody development has become an increasingly popular approach
for drug development. To date, antibody therapeutics are largely developed
using large scale experimental screens of antibody libraries containing
hundreds of millions of antibody sequences. The high cost and difficulty of
developing therapeutic antibodies create a pressing need for computational
methods to predict antibody properties and create bespoke designs. However, the
relationship between antibody sequence and activity is a complex physical
process and traditional iterative design approaches rely on large scale assays
and random mutagenesis. Deep learning methods have emerged as a promising way
to learn antibody property predictors, but predicting antibody properties and
target-specific activities depends critically on the choice of antibody
representations and data linking sequences to properties is often limited.
Existing works have not yet investigated the value, limitations and
opportunities of these methods in application to antibody-based drug discovery.
In this paper, we present results on a novel SARS-CoV-2 antibody binding
dataset and an additional benchmark dataset. We compare three classes of
models: conventional statistical sequence models, supervised learning on each
dataset independently, and fine-tuning an antibody specific pre-trained
language model. Experimental results suggest that self-supervised pretraining
of feature representation consistently offers significant improvement in over
previous approaches. We also investigate the impact of data size on the model
performance, and discuss challenges and opportunities that the machine learning
community can address to advance in silico engineering and design of
therapeutic antibodies.
Related papers
- Large scale paired antibody language models [40.401345152825314]
We present IgBert and IgT5, the best performing antibody-specific language models developed to date.
These models are trained comprehensively using the more than two billion Observed Space dataset.
This advancement marks a significant leap forward in leveraging machine learning, large data sets and high-performance computing for enhancing antibody design for therapeutic development.
arXiv Detail & Related papers (2024-03-26T17:21:54Z) - Improving Antibody Humanness Prediction using Patent Data [6.185604158465185]
We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process.
We pose the initial learning stage as a weakly-supervised contrastive-learning problem.
We then freeze a part of the contrastive encoder and continue training it on the patent data using the cross-entropy loss to predict the humanness score of a given antibody sequence.
arXiv Detail & Related papers (2024-01-25T16:04:17Z) - AI driven B-cell Immunotherapy Design [0.0]
The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction.
In recent years, artificial intelligence and machine learning methods have made significant strides, revolutionising the prediction of protein structures and their complexes.
This review focuses on the progress of machine learning-based tools and their frameworks in the domain of B-cell immunotherapy design.
arXiv Detail & Related papers (2023-09-03T09:14:10Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design [134.65287929316673]
Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences.
The computational methods heavily rely on high-quality antibody structure data, which is quite limited.
Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
arXiv Detail & Related papers (2022-10-26T15:31:36Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Sequence-based deep learning antibody design for in silico antibody
affinity maturation [0.0]
optimization step for therapeutic leads is increasingly popular in antibody discovery pipeline.
Traditional methods and in silico approaches aim to generate candidates with high binding affinity against specific target antigens.
In the present study, we investigated different graph-based designs for depicting antibody-antigen interactions in terms of antibody affinity prediction.
arXiv Detail & Related papers (2021-02-21T02:48:31Z) - Accelerating Antimicrobial Discovery with Controllable Deep Generative
Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules.
We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations.
The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.