Related papers: Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

URL: http://arxiv.org/abs/2410.15021v2
Date: Fri, 06 Jun 2025 20:14:03 GMT
Title: Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding
Authors: Hidetaka Kamigaito, Hiroyuki Deguchi, Yusuke Sakai, Katsuhiko Hayashi, Taro Watanabe,
Abstract summary: Inference methods play an important role in eliciting the performance of large language models (LLMs)<n>Currently, LLMs use inference methods utilizing generated multiple samples, which can be derived from Minimum Bayes Risk (MBR) Decoding.<n>Previous studies have conducted empirical analyses to clarify the improvements in generation performance achieved by MBR decoding.<n>We offer a new theoretical interpretation of MBR decoding from the perspective of bias-diversity decomposition.
Score: 32.02732402635305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inference methods play an important role in eliciting the performance of large language models (LLMs). Currently, LLMs use inference methods utilizing generated multiple samples, which can be derived from Minimum Bayes Risk (MBR) Decoding. Previous studies have conducted empirical analyses to clarify the improvements in generation performance achieved by MBR decoding and have reported various observations. However, the theoretical underpinnings of these findings remain uncertain. To address this, we offer a new theoretical interpretation of MBR decoding from the perspective of bias-diversity decomposition. In this interpretation, the error in the quality estimation of hypotheses by MBR decoding is decomposed into two main factors: bias, which considers the closeness between the utility function and human evaluation, and diversity, which represents the variability in the quality estimation of the utility function. The theoretical analysis reveals the difficulty of simultaneously improving bias and diversity, confirming the validity of enhancing MBR decoding performance by increasing diversity. Furthermore, we reveal that diversity can explain one aspect of inference scaling laws that describe performance improvement by increasing sample size. Moreover, experiments across multiple NLP tasks yielded results consistent with these theoretical characteristics. Our code is available at https://github.com/naist-nlp/mbr-bias-diversity.

Related papers

Semantic uncertainty in advanced decoding methods for LLM generation [35.31962554915952]
This study investigates semantic uncertainty in large language model (LLM) outputs across different decoding methods.<n>We analyze how different decoding strategies affect both the diversity and reliability of model outputs.
arXiv Detail & Related papers (2025-06-17T10:09:29Z)
Theoretical Guarantees for Minimum Bayes Risk Decoding [4.421486904657393]
We show that Minimum Bayes Risk (MBR) decoding approaches the optimal solution with high probability at a rate of $Oleft(n-frac12right)$. This result helps to theoretically explain the strong performance observed in several prior empirical studies on MBR decoding.
arXiv Detail & Related papers (2025-02-18T09:43:15Z)
Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions.<n>Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
Reconciling Predictive Multiplicity in Practice [43.74883617124773]
Reconcile is a reconciliation procedure to address the model multiplicity (MM) phenomenon. In this paper, we empirically analyze the Reconcile algorithm using five widely-used fairness datasets. We extend the Reconcile algorithm to the setting of causal inference, given that different competing estimators can again disagree on specific causal average treatment effect (CATE) values.
arXiv Detail & Related papers (2025-01-27T22:48:20Z)
A Theoretical Perspective for Speculative Decoding Algorithm [60.79447486066416]
One effective way to accelerate inference is emphSpeculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate. This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, emphoutput quality and inference acceleration, from a theoretical perspective.
arXiv Detail & Related papers (2024-10-30T01:53:04Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
On the True Distribution Approximation of Minimum Bayes-Risk Decoding [3.409873726183299]
Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation. Previous studies reported that the performance varies by sampling methods. This study uses anomaly detection to measure the degree of approximation.
arXiv Detail & Related papers (2024-03-31T17:47:22Z)
Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models [7.770029179741429]
We propose P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems. We show that our PP&PR provide more reliable estimates for comparing fidelity and diversity than the existing metrics.
arXiv Detail & Related papers (2023-09-04T13:19:17Z)
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations. We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Reenvisioning Collaborative Filtering vs Matrix Factorization [65.74881520196762]
Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. Announcement of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness. We show the potential these techniques may have on beyond-accuracy evaluation while analyzing effect on complementary evaluation dimensions.
arXiv Detail & Related papers (2021-07-28T16:29:38Z)
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation [26.33252528975464]
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words. Recent work has tied these shortcomings to beam search. Eikema & Aziz ( 2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead.
arXiv Detail & Related papers (2021-05-18T13:31:05Z)
Bayesian Uncertainty Estimation of Learned Variational MRI Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty. We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z)
On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation [86.11292297348622]
We show that a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution. We propose CR/NRR as a substitute for quality/diversity metric pair.
arXiv Detail & Related papers (2020-07-03T04:06:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.