On the Relation between Quality-Diversity Evaluation and
Distribution-Fitting Goal in Text Generation
- URL: http://arxiv.org/abs/2007.01488v2
- Date: Wed, 19 Aug 2020 03:37:59 GMT
- Title: On the Relation between Quality-Diversity Evaluation and
Distribution-Fitting Goal in Text Generation
- Authors: Jianing Li, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
- Abstract summary: We show that a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution.
We propose CR/NRR as a substitute for quality/diversity metric pair.
- Score: 86.11292297348622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of text generation models is to fit the underlying real probability
distribution of text. For performance evaluation, quality and diversity metrics
are usually applied. However, it is still not clear to what extend can the
quality-diversity evaluation reflect the distribution-fitting goal. In this
paper, we try to reveal such relation in a theoretical approach. We prove that
under certain conditions, a linear combination of quality and diversity
constitutes a divergence metric between the generated distribution and the real
distribution. We also show that the commonly used BLEU/Self-BLEU metric pair
fails to match any divergence metric, thus propose CR/NRR as a substitute for
quality/diversity metric pair.
Related papers
- A Unifying Information-theoretic Perspective on Evaluating Generative Models [5.524685807042777]
Several recent approaches utilize "precision" and "recall," borrowed from the classification domain, to individually quantify the output fidelity (realism) and output diversity (representation of the real data variation)
We unify a class of kth-nearest-neighbors (kNN)-based metrics under an information-theoretic lens using approaches from kNN density estimation.
We propose a tri-dimensional metric composed of Precision Cross-Entropy (PCE), Recall Cross-Entropy (RCE), and Recall Entropy (RE)
arXiv Detail & Related papers (2024-12-18T21:17:02Z) - Theoretical Aspects of Bias and Diversity in Minimum Bayes Risk Decoding [32.02732402635305]
Minimum Bayes Risk (MBR) decoding can mitigate this problem by utilizing automatic evaluation metrics and model-generated pseudo-references.
We decompose errors in the estimated quality of generated hypotheses into two key factors: bias, which reflects the closeness between utility functions and human evaluations, and diversity, which represents the variation in the estimated quality of utility functions.
arXiv Detail & Related papers (2024-10-19T07:32:10Z) - A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics [4.757470449749877]
We show that these metrics can be unified under a general framework of kernel-based two-sample statistics.
This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics.
As illustrative applications, we demonstrate how these bounds facilitate the component of error bounds for procedures such as distance covariance-based dimension reduction.
arXiv Detail & Related papers (2024-05-22T22:41:56Z) - Probabilistic Precision and Recall Towards Reliable Evaluation of
Generative Models [7.770029179741429]
We propose P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems.
We show that our PP&PR provide more reliable estimates for comparing fidelity and diversity than the existing metrics.
arXiv Detail & Related papers (2023-09-04T13:19:17Z) - Towards Multiple References Era -- Addressing Data Leakage and Limited
Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks.
Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations.
We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z) - On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Distributional Random Forests: Heterogeneity Adjustment and Multivariate
Distributional Regression [0.8574682463936005]
We propose a novel forest construction for multivariate responses based on their joint conditional distribution.
The code is available as Python and R packages drf.
arXiv Detail & Related papers (2020-05-29T09:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.