Related papers: EEL: Efficiently Encoding Lattices for Reranking

EEL: Efficiently Encoding Lattices for Reranking

URL: http://arxiv.org/abs/2306.00947v1
Date: Thu, 1 Jun 2023 17:45:32 GMT
Title: EEL: Efficiently Encoding Lattices for Reranking
Authors: Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
Abstract summary: We use Transformers to efficiently encode lattices of generated outputs. We combine this approach with a new class of token-factored rerankers (TFRs) Our results show both substantial speedup compared to naive reranking and often better performance on downstream metrics than comparable approaches.
Score: 44.77383151122229
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality. Reranking to optimize for "downstream" metrics can better optimize for quality, but many metrics of interest are computed with pre-trained language models, which are slow to apply to large numbers of hypotheses. We explore an approach for reranking hypotheses by using Transformers to efficiently encode lattices of generated outputs, a method we call EEL. With a single Transformer pass over the entire lattice, we can approximately compute a contextualized representation of each token as if it were only part of a single hypothesis in isolation. We combine this approach with a new class of token-factored rerankers (TFRs) that allow for efficient extraction of high reranker-scoring hypotheses from the lattice. Empirically, our approach incurs minimal degradation error compared to the exponentially slower approach of encoding each hypothesis individually. When applying EEL with TFRs across three text generation tasks, our results show both substantial speedup compared to naive reranking and often better performance on downstream metrics than comparable approaches.

Related papers

Parameter-Efficient Transformer Embeddings [0.0]
We propose an alternative approach in which token embedding vectors are first generated deterministically, directly from the token IDs.<n>We train standard transformers and our architecture on natural language inference tasks.<n>Our results demonstrate that the proposed method competitive performance using significantly fewer parameters, trains faster, and operates effectively without the need for dropout.
arXiv Detail & Related papers (2025-05-04T21:47:18Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Towards Optimal Multi-draft Speculative Decoding [102.67837141152232]
Multi-Draft Speculative Decoding (MDSD) is a recent approach where, when generating each token, a small draft model generates multiple drafts. This paper discusses the dual of the optimal transport problem, providing a way to efficiently compute the optimal acceptance rate.
arXiv Detail & Related papers (2025-02-26T03:22:44Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
KL-geodesics flow matching with a novel sampling scheme [4.347494885647007]
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models. We investigate a conditional flow matching approach for text generation.
arXiv Detail & Related papers (2024-11-25T17:15:41Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models. We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z)
Self-Consistent Decoding for More Factual Open Responses [28.184313177333642]
"Sample & Select" improves factuality by a 30% relative margin against decoders of DoLA, P-CRR, and S-CRR. We collect human verifications of the generated summaries, confirming the factual superiority of our method.
arXiv Detail & Related papers (2024-03-01T17:31:09Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression. We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability. Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z)
KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs) We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM. We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound. It bypasses the computation of singular values and allows fast optimization by algorithms. It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.