Related papers: Soft Self-Consistency Improves Language Model Agents

Soft Self-Consistency Improves Language Model Agents

URL: http://arxiv.org/abs/2402.13212v2
Date: Wed, 5 Jun 2024 19:50:19 GMT
Title: Soft Self-Consistency Improves Language Model Agents
Authors: Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal,
Abstract summary: Current "sample and select" methods rely on majority voting to score answers. Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
Score: 57.66282463340297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.

Related papers

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency (SC) results in significant computational costs proportional to the number of samples generated. We propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that adjusts the number of sample generations. RASC significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC.
arXiv Detail & Related papers (2024-08-30T05:14:59Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions. We propose a novel method to address this challenge. We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation [20.138831477848615]
We propose Fine-Grained Self-Consistency (FSC) to optimize output quality by effectively fine-grained consensus knowledge from multiple samples. The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning.
arXiv Detail & Related papers (2024-07-02T08:38:31Z)
Atomic Self-Consistency for Better Long Form Generations [12.753854064540636]
Atomic Self-Consistency (ASC) is a technique for improving the recall of relevant information in a long-form response. ASC follows recent work, Universal Self-Consistency (USC) in using multiple samples to improve the long-form response. Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample.
arXiv Detail & Related papers (2024-05-21T18:05:44Z)
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [15.088675135566646]
Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. We propose a simple and scalable sampling process, textbfEarly-Stopping textbfSelf-textbfConsistency (ESC) to greatly reduce the cost of SC without sacrificing performance.
arXiv Detail & Related papers (2024-01-19T04:03:59Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
Universal Self-Consistency for Large Language Model Generation [72.6761480346095]
Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on challenging tasks. We propose Universal Self-Consistency (USC), which leverages large language models (LLMs) to select the most consistent answer.
arXiv Detail & Related papers (2023-11-29T02:07:09Z)
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency. We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question. Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z)
SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation [72.58308581812149]
We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner. Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio. In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
arXiv Detail & Related papers (2021-12-06T08:38:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.