The Consensus Game: Language Model Generation via Equilibrium Search
- URL: http://arxiv.org/abs/2310.09139v1
- Date: Fri, 13 Oct 2023 14:27:21 GMT
- Title: The Consensus Game: Language Model Generation via Equilibrium Search
- Authors: Athul Paul Jacob, Yikang Shen, Gabriele Farina and Jacob Andreas
- Abstract summary: We introduce a new, a training-free, game-theoretic procedure for language model decoding.
Our approach casts language model decoding as a regularized imperfect-information sequential signaling game.
Applying EQUILIBRIUM-RANKING to LLaMA-7B outperforms the much larger LLaMA-65B and PaLM-540B models.
- Score: 73.51411916625032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied to question answering and other text generation tasks, language
models (LMs) may be queried generatively (by sampling answers from their output
distribution) or discriminatively (by using them to score or rank a set of
candidate outputs). These procedures sometimes yield very different
predictions. How do we reconcile mutually incompatible scoring procedures to
obtain coherent LM predictions? We introduce a new, a training-free,
game-theoretic procedure for language model decoding. Our approach casts
language model decoding as a regularized imperfect-information sequential
signaling game - which we term the CONSENSUS GAME - in which a GENERATOR seeks
to communicate an abstract correctness parameter using natural language
sentences to a DISCRIMINATOR. We develop computational procedures for finding
approximate equilibria of this game, resulting in a decoding algorithm we call
EQUILIBRIUM-RANKING. Applied to a large number of tasks (including reading
comprehension, commonsense reasoning, mathematical problem-solving, and
dialog), EQUILIBRIUM-RANKING consistently, and sometimes substantially,
improves performance over existing LM decoding procedures - on multiple
benchmarks, we observe that applying EQUILIBRIUM-RANKING to LLaMA-7B
outperforms the much larger LLaMA-65B and PaLM-540B models. These results
highlight the promise of game-theoretic tools for addressing fundamental
challenges of truthfulness and consistency in LMs.
Related papers
- Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem [18.020492646988746]
We present Set-Based Prompting, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences.
Despite our inputs being out of distribution, the impact on expected accuracy is small, where the expectation is over the order of uniformly chosen shuffling of the candidate responses.
arXiv Detail & Related papers (2024-06-04T16:09:13Z) - Automated Assessment of Students' Code Comprehension using LLMs [0.3293989832773954]
Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models are assessed.
Our findings indicate that LLMs, when prompted in few-shot and chain-of-thought setting, perform comparable to fine-tuned encoder-based models in evaluating students' short answers in programming domain.
arXiv Detail & Related papers (2023-12-19T20:39:12Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.