Language Model Decoding as Likelihood-Utility Alignment
- URL: http://arxiv.org/abs/2210.07228v1
- Date: Thu, 13 Oct 2022 17:55:51 GMT
- Title: Language Model Decoding as Likelihood-Utility Alignment
- Authors: Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit
Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre K{\i}c{\i}man,
Boi Faltings, Robert West
- Abstract summary: We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
- Score: 54.70547032876017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A critical component of a successful language generation pipeline is the
decoding algorithm. However, the general principles that should guide the
choice of decoding algorithm remain unclear. Previous works only compare
decoding algorithms in narrow scenarios and their findings do not generalize
across tasks. To better structure the discussion, we introduce a taxonomy that
groups decoding strategies based on their implicit assumptions about how well
the model's likelihood is aligned with the task-specific notion of utility. We
argue that this taxonomy allows a broader view of the decoding problem and can
lead to generalizable statements because it is grounded on the interplay
between the decoding algorithms and the likelihood-utility misalignment.
Specifically, by analyzing the correlation between the likelihood and the
utility of predictions across a diverse set of tasks, we provide the first
empirical evidence supporting the proposed taxonomy, and a set of principles to
structure reasoning when choosing a decoding algorithm. Crucially, our analysis
is the first one to relate likelihood-based decoding strategies with strategies
that rely on external information such as value-guided methods and prompting,
and covers the most diverse set of tasks up-to-date.
Related papers
- Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension [9.67774998354062]
Previous research has primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation.
We propose a Premise-Oriented Data Augmentation (PODA) framework to generate CoT rationales including analyses for both correct and incorrect options.
We also introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples.
arXiv Detail & Related papers (2024-09-22T15:44:43Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - e-COP : Episodic Constrained Optimization of Policies [12.854752753529151]
We present the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings.
We show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting.
arXiv Detail & Related papers (2024-06-13T20:12:09Z) - A Framework for Guided Motion Planning [1.179253400575852]
We formalize the notion of guided search by defining the concept of a guiding space.
This new language encapsulates many seemingly distinct prior methods under the same framework.
We suggest an information theoretic method to evaluate guidance, which experimentally matches intuition when tested on known algorithms.
arXiv Detail & Related papers (2024-04-04T00:58:19Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Encoding Version History Context for Better Code Representation [13.045078976464307]
This paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification.
To ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models.
arXiv Detail & Related papers (2024-02-06T07:35:36Z) - Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding [46.485363806259265]
Speculative Decoding has emerged as a novel decoding paradigm for Large Language Models (LLMs) inference.
In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel.
This paper presents a comprehensive overview and analysis of this promising decoding paradigm.
arXiv Detail & Related papers (2024-01-15T17:26:50Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment [74.0482641714311]
We introduce two coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and joint entity alignment algorithm.
Our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.
arXiv Detail & Related papers (2020-01-23T18:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.