EEL: Efficiently Encoding Lattices for Reranking
- URL: http://arxiv.org/abs/2306.00947v1
- Date: Thu, 1 Jun 2023 17:45:32 GMT
- Title: EEL: Efficiently Encoding Lattices for Reranking
- Authors: Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
- Abstract summary: We use Transformers to efficiently encode lattices of generated outputs.
We combine this approach with a new class of token-factored rerankers (TFRs)
Our results show both substantial speedup compared to naive reranking and often better performance on downstream metrics than comparable approaches.
- Score: 44.77383151122229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard decoding approaches for conditional text generation tasks typically
search for an output hypothesis with high model probability, but this may not
yield the best hypothesis according to human judgments of quality. Reranking to
optimize for "downstream" metrics can better optimize for quality, but many
metrics of interest are computed with pre-trained language models, which are
slow to apply to large numbers of hypotheses. We explore an approach for
reranking hypotheses by using Transformers to efficiently encode lattices of
generated outputs, a method we call EEL. With a single Transformer pass over
the entire lattice, we can approximately compute a contextualized
representation of each token as if it were only part of a single hypothesis in
isolation. We combine this approach with a new class of token-factored
rerankers (TFRs) that allow for efficient extraction of high reranker-scoring
hypotheses from the lattice. Empirically, our approach incurs minimal
degradation error compared to the exponentially slower approach of encoding
each hypothesis individually. When applying EEL with TFRs across three text
generation tasks, our results show both substantial speedup compared to naive
reranking and often better performance on downstream metrics than comparable
approaches.
Related papers
- Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery [0.0]
We focus on distributed estimation and support recovery for high-dimensional linear quantile regression.
We transform the original quantile regression into the least-squares optimization.
An efficient algorithm is developed, which enjoys high computation and communication efficiency.
arXiv Detail & Related papers (2024-05-13T08:32:22Z) - Self-Consistent Decoding for More Factual Open Responses [28.184313177333642]
"Sample & Select" improves factuality by a 30% relative margin against decoders of DoLA, P-CRR, and S-CRR.
We collect human verifications of the generated summaries, confirming the factual superiority of our method.
arXiv Detail & Related papers (2024-03-01T17:31:09Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs)
We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM.
We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z) - Self-Adversarial Learning with Comparative Discrimination for Text
Generation [111.18614166615968]
We propose a novel self-adversarial learning (SAL) paradigm for improving GANs' performance in text generation.
During training, SAL rewards the generator when its currently generated sentence is found to be better than its previously generated samples.
Experiments on text generation benchmark datasets show that our proposed approach substantially improves both the quality and the diversity.
arXiv Detail & Related papers (2020-01-31T07:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.