EEL: Efficiently Encoding Lattices for Reranking
- URL: http://arxiv.org/abs/2306.00947v1
- Date: Thu, 1 Jun 2023 17:45:32 GMT
- Title: EEL: Efficiently Encoding Lattices for Reranking
- Authors: Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
- Abstract summary: We use Transformers to efficiently encode lattices of generated outputs.
We combine this approach with a new class of token-factored rerankers (TFRs)
Our results show both substantial speedup compared to naive reranking and often better performance on downstream metrics than comparable approaches.
- Score: 44.77383151122229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard decoding approaches for conditional text generation tasks typically
search for an output hypothesis with high model probability, but this may not
yield the best hypothesis according to human judgments of quality. Reranking to
optimize for "downstream" metrics can better optimize for quality, but many
metrics of interest are computed with pre-trained language models, which are
slow to apply to large numbers of hypotheses. We explore an approach for
reranking hypotheses by using Transformers to efficiently encode lattices of
generated outputs, a method we call EEL. With a single Transformer pass over
the entire lattice, we can approximately compute a contextualized
representation of each token as if it were only part of a single hypothesis in
isolation. We combine this approach with a new class of token-factored
rerankers (TFRs) that allow for efficient extraction of high reranker-scoring
hypotheses from the lattice. Empirically, our approach incurs minimal
degradation error compared to the exponentially slower approach of encoding
each hypothesis individually. When applying EEL with TFRs across three text
generation tasks, our results show both substantial speedup compared to naive
reranking and often better performance on downstream metrics than comparable
approaches.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Self-Consistent Decoding for More Factual Open Responses [28.184313177333642]
"Sample & Select" improves factuality by a 30% relative margin against decoders of DoLA, P-CRR, and S-CRR.
We collect human verifications of the generated summaries, confirming the factual superiority of our method.
arXiv Detail & Related papers (2024-03-01T17:31:09Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression.
We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability.
Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z) - KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs)
We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM.
We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.