Benchmarking deep generative models for diverse antibody sequence design
- URL: http://arxiv.org/abs/2111.06801v1
- Date: Fri, 12 Nov 2021 16:23:32 GMT
- Title: Benchmarking deep generative models for diverse antibody sequence design
- Authors: Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
- Abstract summary: Deep generative models that learn from sequences alone or from sequences and structures jointly have shown impressive performance on this task.
We consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold.
We benchmark these models on the task of computational design of antibody sequences, which demand designing sequences with high diversity for functional implication.
- Score: 18.515971640245997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational protein design, i.e. inferring novel and diverse protein
sequences consistent with a given structure, remains a major unsolved
challenge. Recently, deep generative models that learn from sequences alone or
from sequences and structures jointly have shown impressive performance on this
task. However, those models appear limited in terms of modeling structural
constraints, capturing enough sequence diversity, or both. Here we consider
three recently proposed deep generative frameworks for protein design: (AR) the
sequence-based autoregressive generative model, (GVP) the precise
structure-based graph neural network, and Fold2Seq that leverages a fuzzy and
scale-free representation of a three-dimensional fold, while enforcing
structure-to-sequence (and vice versa) consistency. We benchmark these models
on the task of computational design of antibody sequences, which demand
designing sequences with high diversity for functional implication. The
Fold2Seq framework outperforms the two other baselines in terms of diversity of
the designed sequences, while maintaining the typical fold.
Related papers
- DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Approximate learning of parsimonious Bayesian context trees [0.0]
The proposed framework is tested on synthetic and real-world data examples.
It outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.
arXiv Detail & Related papers (2024-07-27T11:50:40Z) - Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.
We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.
We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - FoldToken: Learning Protein Language via Vector Quantization and Beyond [56.19308144551836]
We introduce textbfFoldTokenizer to represent protein sequence-structure as discrete symbols.
We refer to the learned symbols as textbfFoldToken, and the sequence of FoldTokens serves as a new protein language.
arXiv Detail & Related papers (2024-02-04T12:18:51Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model
for Protein Design [70.27706384570723]
We propose Fold2Seq, a novel framework for designing protein sequences conditioned on a specific target fold.
We show improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design.
The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges.
arXiv Detail & Related papers (2021-06-24T14:34:24Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.