Masked Generative Modeling with Enhanced Sampling Scheme
- URL: http://arxiv.org/abs/2309.07945v1
- Date: Thu, 14 Sep 2023 09:42:13 GMT
- Title: Masked Generative Modeling with Enhanced Sampling Scheme
- Authors: Daesoo Lee, Erlend Aune, and Sara Malacarne
- Abstract summary: Enhanced Sampling Scheme (ESS) ensures both sample diversity and fidelity.
ESS consists of three stages: Naive Iterative Decoding, Critical Reverse Sampling, and Critical Resampling.
We demonstrate significant performance gains of ESS in both unconditional sampling and class-conditional sampling.
- Score: 1.3927943269211591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel sampling scheme for masked non-autoregressive
generative modeling. We identify the limitations of TimeVQVAE, MaskGIT, and
Token-Critic in their sampling processes, and propose Enhanced Sampling Scheme
(ESS) to overcome these limitations. ESS explicitly ensures both sample
diversity and fidelity, and consists of three stages: Naive Iterative Decoding,
Critical Reverse Sampling, and Critical Resampling. ESS starts by sampling a
token set using the naive iterative decoding as proposed in MaskGIT, ensuring
sample diversity. Then, the token set undergoes the critical reverse sampling,
masking tokens leading to unrealistic samples. After that, critical resampling
reconstructs masked tokens until the final sampling step is reached to ensure
high fidelity. Critical resampling uses confidence scores obtained from a
self-Token-Critic to better measure the realism of sampled tokens, while
critical reverse sampling uses the structure of the quantized latent vector
space to discover unrealistic sample paths. We demonstrate significant
performance gains of ESS in both unconditional sampling and class-conditional
sampling using all the 128 datasets in the UCR Time Series archive.
Related papers
- Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS)
MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Sampling From Autoencoders' Latent Space via Quantization And
Probability Mass Function Concepts [1.534667887016089]
We introduce a novel post-training sampling algorithm rooted in the concept of probability mass functions, coupled with a quantization process.
Our proposed algorithm establishes a vicinity around each latent vector from the input data and then proceeds to draw samples from these defined neighborhoods.
This strategic approach ensures that the sampled latent vectors predominantly inhabit high-probability regions, which, in turn, can be effectively transformed into authentic real-world images.
arXiv Detail & Related papers (2023-08-21T13:18:12Z) - Conditional Sampling of Variational Autoencoders via Iterated
Approximate Ancestral Sampling [7.357511266926065]
Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable.
A principled choice forally exact conditional sampling is Metropolis-within-Gibbs (MWG)
arXiv Detail & Related papers (2023-08-17T16:08:18Z) - Adaptive Siamese Tracking with a Compact Latent Network [219.38172719948048]
We present an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification.
Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples.
We apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN.
arXiv Detail & Related papers (2023-02-02T08:06:02Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - Hardness of Samples Is All You Need: Protecting Deep Learning Models
Using Hardness of Samples [1.2074552857379273]
We show that the hardness degree of model extraction attacks samples is distinguishable from the hardness degree of normal samples.
We propose Hardness-Oriented Detection Approach (HODA) to detect the sample sequences of model extraction attacks.
arXiv Detail & Related papers (2021-06-21T22:03:31Z) - iNNformant: Boundary Samples as Telltale Watermarks [68.8204255655161]
We show that it is possible to generate sets of boundary samples which can identify any of four tested microarchitectures.
These sets can be built to not contain any sample with a worse peak signal-to-noise ratio than 70dB.
arXiv Detail & Related papers (2021-06-14T11:18:32Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.