Multi-segment preserving sampling for deep manifold sampler
- URL: http://arxiv.org/abs/2205.04259v1
- Date: Mon, 9 May 2022 13:19:41 GMT
- Title: Multi-segment preserving sampling for deep manifold sampler
- Authors: Daniel Berenberg, Jae Hyeon Lee, Simon Kelow, Ji Won Park, Andrew
Watkins, Vladimir Gligorijevi\'c, Richard Bonneau, Stephen Ra, Kyunghyun Cho
- Abstract summary: Multi-segment preserving sampling enables the direct inclusion of domain-specific knowledge.
We train two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene.
We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs.
- Score: 40.88321000839884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep generative modeling for biological sequences presents a unique challenge
in reconciling the bias-variance trade-off between explicit biological insight
and model flexibility. The deep manifold sampler was recently proposed as a
means to iteratively sample variable-length protein sequences by exploiting the
gradients from a function predictor. We introduce an alternative approach to
this guided sampling procedure, multi-segment preserving sampling, that enables
the direct inclusion of domain-specific knowledge by designating preserved and
non-preserved segments along the input sequence, thereby restricting variation
to only select regions. We present its effectiveness in the context of antibody
design by training two models: a deep manifold sampler and a GPT-2 language
model on nearly six million heavy chain sequences annotated with the IGHV1-18
gene. During sampling, we restrict variation to only the
complementarity-determining region 3 (CDR3) of the input. We obtain log
probability scores from a GPT-2 model for each sampled CDR3 and demonstrate
that multi-segment preserving sampling generates reasonable designs while
maintaining the desired, preserved regions.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity.
Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z) - MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual
Variational Generative Adversarial Network [7.889342625283858]
Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information.
This article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN.
The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts.
arXiv Detail & Related papers (2023-11-04T11:19:03Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - StyleGenes: Discrete and Efficient Latent Distributions for GANs [149.0290830305808]
We propose a discrete latent distribution for Generative Adversarial Networks (GANs)
Instead of drawing latent vectors from a continuous prior, we sample from a finite set of learnable latents.
We take inspiration from the encoding of information in biological organisms.
arXiv Detail & Related papers (2023-04-30T23:28:46Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Relaxed-Responsibility Hierarchical Discrete VAEs [3.976291254896486]
We introduce textitRelaxed-Responsibility Vector-Quantisation, a novel way to parameterise discrete latent variables.
We achieve state-of-the-art bits-per-dim results for various standard datasets.
arXiv Detail & Related papers (2020-07-14T19:10:05Z) - Feature Transformation Ensemble Model with Batch Spectral Regularization
for Cross-Domain Few-Shot Classification [66.91839845347604]
We propose an ensemble prediction model by performing diverse feature transformations after a feature extraction network.
We use a batch spectral regularization term to suppress the singular values of the feature matrix during pre-training to improve the generalization ability of the model.
The proposed model can then be fine tuned in the target domain to address few-shot classification.
arXiv Detail & Related papers (2020-05-18T05:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.