EEL: Efficiently Encoding Lattices for Reranking
- URL: http://arxiv.org/abs/2306.00947v1
- Date: Thu, 1 Jun 2023 17:45:32 GMT
- Title: EEL: Efficiently Encoding Lattices for Reranking
- Authors: Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
- Abstract summary: We use Transformers to efficiently encode lattices of generated outputs.
We combine this approach with a new class of token-factored rerankers (TFRs)
Our results show both substantial speedup compared to naive reranking and often better performance on downstream metrics than comparable approaches.
- Score: 44.77383151122229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard decoding approaches for conditional text generation tasks typically
search for an output hypothesis with high model probability, but this may not
yield the best hypothesis according to human judgments of quality. Reranking to
optimize for "downstream" metrics can better optimize for quality, but many
metrics of interest are computed with pre-trained language models, which are
slow to apply to large numbers of hypotheses. We explore an approach for
reranking hypotheses by using Transformers to efficiently encode lattices of
generated outputs, a method we call EEL. With a single Transformer pass over
the entire lattice, we can approximately compute a contextualized
representation of each token as if it were only part of a single hypothesis in
isolation. We combine this approach with a new class of token-factored
rerankers (TFRs) that allow for efficient extraction of high reranker-scoring
hypotheses from the lattice. Empirically, our approach incurs minimal
degradation error compared to the exponentially slower approach of encoding
each hypothesis individually. When applying EEL with TFRs across three text
generation tasks, our results show both substantial speedup compared to naive
reranking and often better performance on downstream metrics than comparable
approaches.
Related papers
- Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach.
Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z) - KL-geodesics flow matching with a novel sampling scheme [4.347494885647007]
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models.
We investigate a conditional flow matching approach for text generation.
arXiv Detail & Related papers (2024-11-25T17:15:41Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Self-Consistent Decoding for More Factual Open Responses [28.184313177333642]
"Sample & Select" improves factuality by a 30% relative margin against decoders of DoLA, P-CRR, and S-CRR.
We collect human verifications of the generated summaries, confirming the factual superiority of our method.
arXiv Detail & Related papers (2024-03-01T17:31:09Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs)
We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM.
We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.