Decoupled Sequence and Structure Generation for Realistic Antibody Design
- URL: http://arxiv.org/abs/2402.05982v2
- Date: Mon, 27 May 2024 08:24:55 GMT
- Title: Decoupled Sequence and Structure Generation for Realistic Antibody Design
- Authors: Nayoung Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park,
- Abstract summary: We propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction.
We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens.
Our results demonstrate that ASSD consistently outperforms existing antibody design models.
- Score: 45.72237864940556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Antibody design plays a pivotal role in advancing therapeutics. Although deep learning has made rapid progress in this field, existing methods jointly generate antibody sequences and structures, limiting task-specific optimization. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach is simple, such a decoupling strategy has been overlooked in previous works. We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens. Such sequences are both out-of-distribution and prone to undesirable developability properties that can trigger harmful immune responses in patients. To resolve this, we introduce a composition-based objective that allows an efficient trade-off between high performance and low token repetition. Our results demonstrate that ASSD consistently outperforms existing antibody design models, while the composition-based objective successfully mitigates token repetition of non-autoregressive models. Our code is available at \url{https://github.com/lkny123/ASSD_public}.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Inverse folding for antibody sequence design using deep learning [2.8998926117101367]
We propose a fine-tuned folding inverse model that is specifically optimised for antibody structures.
We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters.
arXiv Detail & Related papers (2023-10-30T13:12:41Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - Extrapolative Controlled Sequence Generation via Iterative Refinement [22.42501277690634]
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training.
In this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation.
Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity.
arXiv Detail & Related papers (2023-03-08T13:21:27Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Benchmarking deep generative models for diverse antibody sequence design [18.515971640245997]
Deep generative models that learn from sequences alone or from sequences and structures jointly have shown impressive performance on this task.
We consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold.
We benchmark these models on the task of computational design of antibody sequences, which demand designing sequences with high diversity for functional implication.
arXiv Detail & Related papers (2021-11-12T16:23:32Z) - Adversarial and Contrastive Variational Autoencoder for Sequential
Recommendation [25.37244686572865]
We propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation.
We first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes framework, which enables our model to generate high-quality latent variables.
Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence.
arXiv Detail & Related papers (2021-03-19T09:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.