Related papers: ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

URL: http://arxiv.org/abs/2204.11458v1
Date: Mon, 25 Apr 2022 06:26:29 GMT
Title: ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Authors: Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos Santos, Yi Tay, Don Metzler
Abstract summary: This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
Score: 70.36083572306839
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.

Related papers

Training and Inference Efficiency of Encoder-Decoder Speech Models [25.031622057759492]
We focus on the efficiency angle and ask the questions of whether we are training these speech models efficiently. We show that negligence in mini-batch sampling leads to more than 50% being spent on padding. We find that adjusting the model architecture to transfer model parameters from the decoder to the encoder results in a 3x inference speedup.
arXiv Detail & Related papers (2025-03-07T20:57:43Z)
Decoder-Only LLMs are Better Controllers for Diffusion Models [63.22040456010123]
We propose to enhance text-to-image diffusion models by borrowing the strength of semantic understanding from large language models. Our adapter module is superior to the stat-of-the-art models in terms of text-to-image generation quality and reliability.
arXiv Detail & Related papers (2025-02-06T12:17:35Z)
Geometry-Preserving Encoder/Decoder in Latent Generative Models [13.703752179071333]
We introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder.
arXiv Detail & Related papers (2025-01-16T23:14:34Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Hierarchical Attention Encoder Decoder [2.4366811507669115]
Autoregressive modeling can generate complex and novel sequences that have many real-world applications. These models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. We propose a model based on the Hierarchical Recurrent Decoder architecture.
arXiv Detail & Related papers (2023-06-01T18:17:23Z)
Improving Code Search with Hard Negative Sampling Based on Fine-tuning [15.341959871682981]
We introduce a cross-encoder architecture for code search that jointly encodes the concatenation of query and code. We also introduce a Retriever-Ranker (RR) framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes. The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language. We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
Regularized Forward-Backward Decoder for Attention Models [5.257115841810258]
We propose a novel regularization technique incorporating a second decoder during the training phase. This decoder is optimized on time-reversed target labels beforehand and supports the standard decoder during training by adding knowledge from future context. We evaluate our approach on the smaller TEDLIUMv2 and the larger LibriSpeech dataset, achieving consistent improvements on both of them.
arXiv Detail & Related papers (2020-06-15T16:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.