Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design
- URL: http://arxiv.org/abs/2211.08406v2
- Date: Thu, 17 Nov 2022 13:51:42 GMT
- Title: Incorporating Pre-training Paradigm for Antibody Sequence-Structure
Co-design
- Authors: Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He,
Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
- Abstract summary: Deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences.
The computational methods heavily rely on high-quality antibody structure data, which is quite limited.
Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data.
- Score: 134.65287929316673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Antibodies are versatile proteins that can bind to pathogens and provide
effective protection for human body. Recently, deep learning-based
computational antibody design has attracted popular attention since it
automatically mines the antibody patterns from data that could be complementary
to human experiences. However, the computational methods heavily rely on
high-quality antibody structure data, which is quite limited. Besides, the
complementarity-determining region (CDR), which is the key component of an
antibody that determines the specificity and binding affinity, is highly
variable and hard to predict. Therefore, the data limitation issue further
raises the difficulty of CDR generation for antibodies. Fortunately, there
exists a large amount of sequence data of antibodies that can help model the
CDR and alleviate the reliance on structure data. By witnessing the success of
pre-training models for protein modeling, in this paper, we develop the
antibody pre-training language model and incorporate it into the
(antigen-specific) antibody design model in a systemic way. Specifically, we
first pre-train an antibody language model based on the sequence data, then
propose a one-shot way for sequence and structure generation of CDR to avoid
the heavy cost and error propagation from an autoregressive manner, and finally
leverage the pre-trained antibody model for the antigen-specific antibody
generation model with some carefully designed modules. Through various
experiments, we show that our method achieves superior performances over
previous baselines on different tasks, such as sequence and structure
generation and antigen-binding CDR-H3 design.
Related papers
- Large scale paired antibody language models [40.401345152825314]
We present IgBert and IgT5, the best performing antibody-specific language models developed to date.
These models are trained comprehensively using the more than two billion Observed Space dataset.
This advancement marks a significant leap forward in leveraging machine learning, large data sets and high-performance computing for enhancing antibody design for therapeutic development.
arXiv Detail & Related papers (2024-03-26T17:21:54Z) - Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization [51.28231365213679]
We tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences.
We propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens.
arXiv Detail & Related papers (2024-03-25T09:41:49Z) - A Hierarchical Training Paradigm for Antibody Structure-sequence
Co-design [54.30457372514873]
We propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design.
HTP consists of four levels of training stages, each corresponding to a specific protein modality.
Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem.
arXiv Detail & Related papers (2023-10-30T02:39:15Z) - Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot
Antibody Designer [58.97153056120193]
The specificity of an antibody is determined by its complementarity-determining regions (CDRs)
Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling.
We propose a textitsimple yet effective model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner.
arXiv Detail & Related papers (2023-04-21T13:24:26Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Antibody Representation Learning for Drug Discovery [7.291511531280898]
We present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset.
We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model.
Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches.
arXiv Detail & Related papers (2022-10-05T13:48:41Z) - Iterative Refinement Graph Neural Network for Antibody
Sequence-Structure Co-design [35.215029426177004]
We propose a generative model to automatically design antibodies with enhanced binding specificity or neutralization capabilities.
Our method achieves superior log-likelihood on the test set and outperforms previous baselines in designing antibodies capable of neutralizing the SARS-CoV-2 virus.
arXiv Detail & Related papers (2021-10-09T18:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.