On the Relation between Quality-Diversity Evaluation and
Distribution-Fitting Goal in Text Generation
- URL: http://arxiv.org/abs/2007.01488v2
- Date: Wed, 19 Aug 2020 03:37:59 GMT
- Title: On the Relation between Quality-Diversity Evaluation and
Distribution-Fitting Goal in Text Generation
- Authors: Jianing Li, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
- Abstract summary: We show that a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution.
We propose CR/NRR as a substitute for quality/diversity metric pair.
- Score: 86.11292297348622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of text generation models is to fit the underlying real probability
distribution of text. For performance evaluation, quality and diversity metrics
are usually applied. However, it is still not clear to what extend can the
quality-diversity evaluation reflect the distribution-fitting goal. In this
paper, we try to reveal such relation in a theoretical approach. We prove that
under certain conditions, a linear combination of quality and diversity
constitutes a divergence metric between the generated distribution and the real
distribution. We also show that the commonly used BLEU/Self-BLEU metric pair
fails to match any divergence metric, thus propose CR/NRR as a substitute for
quality/diversity metric pair.
Related papers
- Theoretical Aspects of Bias and Diversity in Minimum Bayes Risk Decoding [32.02732402635305]
Minimum Bayes Risk (MBR) decoding can mitigate this problem by utilizing automatic evaluation metrics and model-generated pseudo-references.
We decompose errors in the estimated quality of generated hypotheses into two key factors: bias, which reflects the closeness between utility functions and human evaluations, and diversity, which represents the variation in the estimated quality of utility functions.
arXiv Detail & Related papers (2024-10-19T07:32:10Z) - Probabilistic Precision and Recall Towards Reliable Evaluation of
Generative Models [7.770029179741429]
We propose P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems.
We show that our PP&PR provide more reliable estimates for comparing fidelity and diversity than the existing metrics.
arXiv Detail & Related papers (2023-09-04T13:19:17Z) - Towards Multiple References Era -- Addressing Data Leakage and Limited
Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks.
Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations.
We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z) - On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - On the Interpretability and Significance of Bias Metrics in Texts: a
PMI-based Approach [3.2326259807823026]
We analyze an alternative PMI-based metric to quantify biases in texts.
It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences.
arXiv Detail & Related papers (2021-04-13T19:34:17Z) - Distributional Random Forests: Heterogeneity Adjustment and Multivariate
Distributional Regression [0.8574682463936005]
We propose a novel forest construction for multivariate responses based on their joint conditional distribution.
The code is available as Python and R packages drf.
arXiv Detail & Related papers (2020-05-29T09:05:00Z) - Reliable Fidelity and Diversity Metrics for Generative Models [30.941563781926202]
The most widely used metric for measuring the similarity between real and generated images has been the Fr'echet Inception Distance (FID) score.
We show that even the latest version of the precision and recall metrics are not reliable yet.
We propose density and coverage metrics that solve the above issues.
arXiv Detail & Related papers (2020-02-23T00:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.