Local and Global Decoding in Text Generation
- URL: http://arxiv.org/abs/2410.10810v1
- Date: Mon, 14 Oct 2024 17:59:38 GMT
- Title: Local and Global Decoding in Text Generation
- Authors: Daniel Gareev, Thomas Hofmann, Ezhilmathi Krishnasamy, Tiago Pimentel,
- Abstract summary: Text generation relies on decoding algorithms that sample strings from a language model distribution.
We investigate the effect of distortion by introducing globally-normalised versions of these decoding methods.
Our results suggest that distortion is an important feature of local decoding algorithms.
- Score: 36.38298679687864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text generation, a key component in applications such as dialogue systems, relies on decoding algorithms that sample strings from a language model distribution. Traditional methods, such as top-$k$ and top-$\pi$, apply local normalisation to the model's output distribution, which can distort it. In this paper, we investigate the effect of this distortion by introducing globally-normalised versions of these decoding methods. Additionally, we propose an independent Metropolis-Hastings algorithm to approximate sampling from globally-normalised distributions without explicitly computing them. Our empirical analysis compares the performance of local and global normalisation across two decoding algorithms (top-$k$ and top-$\pi$) with various hyperparameters, using Pythia language models. Results show that, in most configurations, global decoding performs worse than the local decoding version of the same algorithms -- despite preserving the distribution's integrity. Our results suggest that distortion is an important feature of local decoding algorithms.
Related papers
- Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models [0.0]
We develop the theory of decoding strategies for language models by expressing popular decoding algorithms as equilibrium states in the language of ergodic theory.
We analyze the effect of the local normalization step of top-k, nucleus, and temperature sampling, used to make probabilities sum to one.
Contrary to the prevailing explanation, we argue that the major cause of the under-performance of top-k sampling relative to nucleus sampling is local normalization distortion.
arXiv Detail & Related papers (2025-03-27T19:15:43Z) - The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs [4.735670734773145]
We show the existence of a strict tradeoff between channel rate, distortion perception, and classification accuracy.
We propose two image compression methods to navigate that tradeoff: theCO algorithm and ID-GAN, which is more general compression.
They also demonstrate that the proposed ID-GAN algorithm balances image distortion, perception, classification accuracy, and significantly outperforms traditional separation-based methods.
arXiv Detail & Related papers (2023-12-22T16:06:43Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - Massive-scale Decoding for Text Generation using Lattices [34.2658286826597]
We present a search algorithm to construct lattices encoding a massive number of generation options.
We show that our algorithm encodes hundreds to thousands of diverse options that remain grammatical and high-quality into one linear-sized lattice.
arXiv Detail & Related papers (2021-12-14T18:56:11Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - Learning Multiple Sound Source 2D Localization [7.564344795030588]
We propose novel deep learning based algorithms for multiple sound source localization.
We use an encoding-decoding architecture and propose two improvements on it to accomplish the task.
New metrics are developed relying on resolution-based multiple source association.
arXiv Detail & Related papers (2020-12-10T08:51:16Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - Principles and Algorithms for Forecasting Groups of Time Series:
Locality and Globality [0.5076419064097732]
We formalize the setting of forecasting a set of time series with local and global methods.
Global models can succeed in a wider range of problems than previously thought.
purposely naive algorithms derived from these principles result in superior accuracy.
arXiv Detail & Related papers (2020-08-02T10:22:05Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.