Distributional Discrepancy: A Metric for Unconditional Text Generation
- URL: http://arxiv.org/abs/2005.01282v2
- Date: Thu, 2 Jul 2020 15:40:14 GMT
- Title: Distributional Discrepancy: A Metric for Unconditional Text Generation
- Authors: Ping Cai, Xingyuan Chen, Peng Jin, Hongjun Wang, Tianrui Li
- Abstract summary: The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data.
A novel metric of distributional discrepancy (DD) is designed to evaluate generators based on the discrepancy between the generated and real training sentences.
DD is significantly better than the three existing metrics for ranking these generative models.
- Score: 6.6159481812419045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The purpose of unconditional text generation is to train a model with real
sentences, then generate novel sentences of the same quality and diversity as
the training data. However, when different metrics are used for comparing the
methods of unconditional text generation, contradictory conclusions are drawn.
The difficulty is that both the diversity and quality of the sample should be
considered simultaneously when the models are evaluated. To solve this problem,
a novel metric of distributional discrepancy (DD) is designed to evaluate
generators based on the discrepancy between the generated and real training
sentences. However, it cannot compute the DD directly because the distribution
of real sentences is unavailable. Thus, we propose a method for estimating the
DD by training a neural-network-based text classifier. For comparison, three
existing metrics, bi-lingual evaluation understudy (BLEU) versus self-BLEU,
language model score versus reverse language model score, and Fr\'{e}chet
embedding distance, along with the proposed DD, are used to evaluate two
popular generative models of long short-term memory and generative pretrained
transformer 2 on both syntactic and real data. Experimental results show that
DD is significantly better than the three existing metrics for ranking these
generative models.
Related papers
- Training Implicit Generative Models via an Invariant Statistical Loss [3.139474253994318]
Implicit generative models have the capability to learn arbitrary complex data distributions.
On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators.
We develop a discriminator-free method for training one-dimensional (1D) generative implicit models.
arXiv Detail & Related papers (2024-02-26T09:32:28Z) - Enhancing Text Generation with Cooperative Training [23.971227375706327]
Most prevailing methods trained generative and discriminative models in isolation, which left them unable to adapt to changes in each other.
We introduce a textitself-consistent learning framework in the text field that involves training a discriminator and generator cooperatively in a closed-loop manner.
Our framework are able to mitigate training instabilities such as mode collapse and non-convergence.
arXiv Detail & Related papers (2023-03-16T04:21:19Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time.
We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z) - On the Blind Spots of Model-Based Evaluation Metrics for Text Generation [79.01422521024834]
We explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics.
We design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores.
Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics.
arXiv Detail & Related papers (2022-12-20T06:24:25Z) - Unsupervised Mismatch Localization in Cross-Modal Sequential Data [5.932046800902776]
We develop an unsupervised learning algorithm that can infer the relationship between content-mismatched cross-modal data.
We propose a hierarchical Bayesian deep learning model, named mismatch localization variational autoencoder (ML-VAE), that decomposes the generative process of the speech into hierarchically structured latent variables.
Our experimental results show that ML-VAE successfully locates the mismatch between text and speech, without the need for human annotations.
arXiv Detail & Related papers (2022-05-05T14:23:27Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - Sparse Text Generation [7.747003493657217]
Current text generators require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling.
In this paper, we use the recently introduced entmax transformation to train and sample from a sparse language model, avoiding this mismatch.
The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text.
arXiv Detail & Related papers (2020-04-06T13:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.