InitialGAN: A Language GAN with Completely Random Initialization
- URL: http://arxiv.org/abs/2208.02531v3
- Date: Tue, 18 Jul 2023 08:06:19 GMT
- Title: InitialGAN: A Language GAN with Completely Random Initialization
- Authors: Da Ren and Qing Li
- Abstract summary: Generative Adversarial Networks (GANs) are shown to have potential to tackle the notorious exposure bias problem.
Existing language GANs adopt estimators like REINFORCE or continuous relaxations to model word probabilities.
In this work, we present two techniques to tackle these problems: dropout sampling and fully normalized LSTM.
- Score: 7.642043456676739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text generative models trained via Maximum Likelihood Estimation (MLE) suffer
from the notorious exposure bias problem, and Generative Adversarial Networks
(GANs) are shown to have potential to tackle this problem. Existing language
GANs adopt estimators like REINFORCE or continuous relaxations to model word
probabilities. The inherent limitations of such estimators lead current models
to rely on pre-training techniques (MLE pre-training or pre-trained
embeddings). Representation modeling methods which are free from those
limitations, however, are seldomly explored because of their poor performance
in previous attempts. Our analyses reveal that invalid sampling methods and
unhealthy gradients are the main contributors to such unsatisfactory
performance. In this work, we present two techniques to tackle these problems:
dropout sampling and fully normalized LSTM. Based on these two techniques, we
propose InitialGAN whose parameters are randomly initialized in full. Besides,
we introduce a new evaluation metric, Least Coverage Rate, to better evaluate
the quality of generated samples. The experimental results demonstrate that
InitialGAN outperforms both MLE and other compared models. To the best of our
knowledge, it is the first time a language GAN can outperform MLE without using
any pre-training techniques.
Related papers
- Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image
Synthesis [7.234618871984921]
An emerging area of research aims to learn deep generative models with limited training data.
We propose RS-IMLE, a novel approach that changes the prior distribution used for training.
This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods.
arXiv Detail & Related papers (2024-09-26T00:19:42Z) - Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning
in Generative Adversarial Networks [5.479797073162603]
This work is inspired by a crucial observation: the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features.
Our proposed method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples.
This method unfolds in two stages: in the initial stage, we adapt the pre-trained GAN using negative samples provided by the user, while in the subsequent stage, we focus on unlearning the undesired feature.
arXiv Detail & Related papers (2023-09-25T11:36:20Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z) - ColdGANs: Taming Language GANs with Cautious Sampling Strategies [29.943949944682196]
Generative Adversarial Networks (GANs) can mitigate limitations but the discrete nature of text has hindered their application to language generation.
We show how classical sampling results in unstable training.
We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics.
For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks.
arXiv Detail & Related papers (2020-06-08T14:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.