Related papers: Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

URL: http://arxiv.org/abs/2410.13201v1
Date: Thu, 17 Oct 2024 04:06:02 GMT
Title: Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Authors: Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling Zhen Li, Ray-I Chang, Hung-yi Lee,
Abstract summary: We propose a scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models. We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence. Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation.
Score: 53.63593099509471
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text. It has been adapted for sequence-to-sequence text generation (Seq2Seq) through DiffuSeq, termed S2S Diffusion. Existing S2S-Diffusion models predominantly rely on fixed or hand-crafted rules to schedule noise during the diffusion and denoising processes. However, these models are limited by non-contextualized noise, which fails to fully consider the characteristics of Seq2Seq tasks. In this paper, we propose the Meta-DiffuB framework - a novel scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models. We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence. Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation. Meta-DiffuB achieves state-of-the-art performance compared to previous S2S-Diffusion models and fine-tuned pre-trained language models (PLMs) across four Seq2Seq benchmark datasets. We further investigate and visualize the impact of Meta-DiffuB's noise scheduling on the generation of sentences with varying difficulties. Additionally, our scheduler model can function as a "plug-and-play" model to enhance DiffuSeq without the need for fine-tuning during the inference stage.

Related papers

[MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models. By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z)
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models [12.907590808274358]
We propose a novel Rich-contextual Diffusion Models (RCDMs) to enhance story generation's semantic consistency and temporal consistency. RCDMs can generate consistent stories with a single forward inference compared to autoregressive models.
arXiv Detail & Related papers (2024-07-02T17:58:07Z)
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS [0.0]
The diffusion model is capable of generating high-quality data through a probabilistic approach. It suffers from the drawback of slow generation speed due to the requirement of a large number of time steps. We propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data.
arXiv Detail & Related papers (2023-08-03T07:22:04Z)
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding [57.42429912884543]
We propose Diff-LM-Speech, Tetra-Diff-Speech and Tri-Diff-Speech to solve high dimensionality and waveform distortion problems. We also introduce a prompt encoder structure based on a variational autoencoder and a prosody bottleneck to improve prompt representation ability. Experimental results show that our proposed methods outperform baseline methods.
arXiv Detail & Related papers (2023-07-28T11:20:23Z)
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA) Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models. To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z)
Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing [18.441585314765632]
Sequence-to-Sequence (S2S) models have achieved remarkable success on various text generation tasks. We present a systematic study of S2S modeling using contained decoding on four core tasks.
arXiv Detail & Related papers (2023-02-05T01:37:26Z)
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models [15.913828295673705]
DiffuSeq is a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks. We show that DiffuSeq achieves comparable or even better performance than six established baselines. A theoretical analysis reveals the connection between DiffuSeq and autoregressive/non-autoregressive models.
arXiv Detail & Related papers (2022-10-17T10:49:08Z)
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z)
Robust Face Anti-Spoofing with Dual Probabilistic Modeling [49.14353429234298]
We propose a unified framework called Dual Probabilistic Modeling (DPM), with two dedicated modules, DPM-LQ (Label Quality aware learning) and DPM-DQ (Data Quality aware learning) DPM-LQ is able to produce robust feature representations without overfitting to the distribution of noisy semantic labels. DPM-DQ can eliminate data noise from False Reject' and False Accept' during inference by correcting the prediction confidence of noisy data based on its quality distribution.
arXiv Detail & Related papers (2022-04-27T03:44:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.