TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural
Language Generation
- URL: http://arxiv.org/abs/2011.13527v1
- Date: Fri, 27 Nov 2020 02:26:15 GMT
- Title: TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural
Language Generation
- Authors: Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen
- Abstract summary: TaylorGAN is a novel approach to score function-based natural language generation.
It augments the gradient estimation by off-policy update and the first-order Taylor expansion.
It enables us to train NLG models from scratch with smaller batch size.
- Score: 79.4205462326301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score function-based natural language generation (NLG) approaches such as
REINFORCE, in general, suffer from low sample efficiency and training
instability problems. This is mainly due to the non-differentiable nature of
the discrete space sampling and thus these methods have to treat the
discriminator as a black box and ignore the gradient information. To improve
the sample efficiency and reduce the variance of REINFORCE, we propose a novel
approach, TaylorGAN, which augments the gradient estimation by off-policy
update and the first-order Taylor expansion. This approach enables us to train
NLG models from scratch with smaller batch size -- without maximum likelihood
pre-training, and outperforms existing GAN-based methods on multiple metrics of
quality and diversity. The source code and data are available at
https://github.com/MiuLab/TaylorGAN
Related papers
- Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Research of Damped Newton Stochastic Gradient Descent Method for Neural
Network Training [6.231508838034926]
First-order methods like gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs)
In this paper, we propose the Damped Newton Descent(DN-SGD) and Gradient Descent Damped Newton(SGD-DN) methods to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL)
Our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the learning process much faster and more accurate than SGD.
arXiv Detail & Related papers (2021-03-31T02:07:18Z) - Population Gradients improve performance across data-sets and
architectures in object classification [6.17047113475566]
We present a new method to calculate the gradients while training Neural Networks (NNs)
It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes.
Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
arXiv Detail & Related papers (2020-10-23T09:40:23Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.