Related papers: TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

URL: http://arxiv.org/abs/2011.13527v1
Date: Fri, 27 Nov 2020 02:26:15 GMT
Title: TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation
Authors: Chun-Hsing Lin, Siang-Ruei Wu, Hung-Yi Lee, Yun-Nung Chen
Abstract summary: TaylorGAN is a novel approach to score function-based natural language generation. It augments the gradient estimation by off-policy update and the first-order Taylor expansion. It enables us to train NLG models from scratch with smaller batch size.
Score: 79.4205462326301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling and thus these methods have to treat the discriminator as a black box and ignore the gradient information. To improve the sample efficiency and reduce the variance of REINFORCE, we propose a novel approach, TaylorGAN, which augments the gradient estimation by off-policy update and the first-order Taylor expansion. This approach enables us to train NLG models from scratch with smaller batch size -- without maximum likelihood pre-training, and outperforms existing GAN-based methods on multiple metrics of quality and diversity. The source code and data are available at https://github.com/MiuLab/TaylorGAN

Related papers

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method. Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z)
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges. We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training [6.231508838034926]
First-order methods like gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs) In this paper, we propose the Damped Newton Descent(DN-SGD) and Gradient Descent Damped Newton(SGD-DN) methods to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL) Our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the learning process much faster and more accurate than SGD.
arXiv Detail & Related papers (2021-03-31T02:07:18Z)
Population Gradients improve performance across data-sets and architectures in object classification [6.17047113475566]
We present a new method to calculate the gradients while training Neural Networks (NNs) It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes. Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
arXiv Detail & Related papers (2020-10-23T09:40:23Z)
Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm. When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic' We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.