Diversified Sampling Improves Scaling LLM inference
- URL: http://arxiv.org/abs/2502.11027v1
- Date: Sun, 16 Feb 2025 07:37:58 GMT
- Title: Diversified Sampling Improves Scaling LLM inference
- Authors: Tianchun Wang, Zichuan Liu, Yuanzhou Chen, Jonathan Light, Haifeng Chen, Xiang Zhang, Wei Cheng,
- Abstract summary: DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions.
Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
- Score: 31.18762591875725
- License:
- Abstract: While increasing training compute has significantly improved the performance of large language models (LLMs), similar gains have not been observed when scaling inference compute. We hypothesize that the primary issue lies in the uniformity of LLM outputs, which leads to inefficient sampling as models repeatedly generate similar but inaccurate responses. Motivated by an intriguing relationship between solution accuracy (Pass@10) and response diversity, we propose DivSampling-a novel and versatile sampling technique designed to enhance the diversity of candidate solutions by introducing prompt perturbations.DivSampling incorporates two categories of perturbations: task-agnostic approaches, which are general and not tailored to any specific task, and task-specific approaches, which are customized based on task content. Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts. Comprehensive evaluations across various tasks -including reasoning, mathematics, and code generation - highlight the effectiveness of DivSampling in improving solution accuracy. This scalable and efficient approach offers a new perspective on optimizing test-time inference, addressing limitations in current sampling strategies.
Related papers
- Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach.
Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z) - Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.
We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.
Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z) - Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo [55.452453947359736]
We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC)
We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions.
This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
arXiv Detail & Related papers (2024-10-02T18:17:54Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise.
This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z) - Adaptive Meta-learner via Gradient Similarity for Few-shot Text
Classification [11.035878821365149]
We propose a novel Adaptive Meta-learner via Gradient Similarity (AMGS) to improve the model generalization ability to a new task.
Experimental results on several benchmarks demonstrate that the proposed AMGS consistently improves few-shot text classification performance.
arXiv Detail & Related papers (2022-09-10T16:14:53Z) - Robust Topology Optimization Using Multi-Fidelity Variational Autoencoders [1.0124625066746595]
A robust topology optimization (RTO) problem identifies a design with the best average performance.
A neural network method is proposed that offers computational efficiency.
Numerical application of the method is shown on the robust design of L-bracket structure with single point load as well as multiple point loads.
arXiv Detail & Related papers (2021-07-19T20:40:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.