SDR: Efficient Neural Re-ranking using Succinct Document Representation
- URL: http://arxiv.org/abs/2110.02065v1
- Date: Sun, 3 Oct 2021 07:43:16 GMT
- Title: SDR: Efficient Neural Re-ranking using Succinct Document Representation
- Authors: Nachshon Cohen, Amit Portnoy, Besnik Fetahu, and Amir Ingber
- Abstract summary: We propose the Succinct Document Representation scheme that computes emphhighly compressed intermediate document representations.
Our method is highly efficient, achieving 4x-11.6x better compression rates for the same ranking quality.
- Score: 4.9278175139681215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BERT based ranking models have achieved superior performance on various
information retrieval tasks. However, the large number of parameters and
complex self-attention operation come at a significant latency overhead. To
remedy this, recent works propose late-interaction architectures, which allow
pre-computation of intermediate document representations, thus reducing the
runtime latency. Nonetheless, having solved the immediate latency issue, these
methods now introduce storage costs and network fetching latency, which limits
their adoption in real-life production systems.
In this work, we propose the Succinct Document Representation (SDR) scheme
that computes \emph{highly compressed} intermediate document representations,
mitigating the storage/network issue. Our approach first reduces the dimension
of token representations by encoding them using a novel autoencoder
architecture that uses the document's textual content in both the encoding and
decoding phases. After this token encoding step, we further reduce the size of
entire document representations using a modern quantization technique.
Extensive evaluations on passage re-reranking on the MSMARCO dataset show
that compared to existing approaches using compressed document representations,
our method is highly efficient, achieving 4x-11.6x better compression rates for
the same ranking quality.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - DocMamba: Efficient Document Pre-training with State Space Model [56.84200017560988]
We present DocMamba, a novel framework based on the state space model.
It is designed to reduce computational complexity to linear while preserving global modeling capabilities.
Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
arXiv Detail & Related papers (2024-09-18T11:34:28Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection [28.15184715270483]
Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility.
We propose a novel paradigm named Sparse RAG, which seeks to cut costs through sparsity.
Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents.
arXiv Detail & Related papers (2024-05-25T11:10:04Z) - Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding [23.061797784952855]
This paper introduces PAG, a novel optimization and decoding approach that guides autoregressive generation of document identifiers.
Experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin.
arXiv Detail & Related papers (2024-04-22T21:50:01Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - DocDiff: Document Enhancement via Residual Diffusion Models [7.972081359533047]
We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems.
DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module.
Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
arXiv Detail & Related papers (2023-05-06T01:41:10Z) - Faster DAN: Multi-target Queries with Document Positional Encoding for
End-to-end Handwritten Document Recognition [1.7875811547963403]
Faster DAN is a two-step strategy to speed up the recognition process at prediction time.
It is at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets.
arXiv Detail & Related papers (2023-01-25T13:55:14Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Compressibility of Distributed Document Representations [0.0]
CoRe is a representation learner-agnostic framework suitable for representation compression.
We show CoRe's behavior when considering contextual and non-contextual document representations, different compression levels, and 9 different compression algorithms.
Results based on more than 100,000 compression experiments indicate that CoRe offers a very good trade-off between the compression efficiency and performance.
arXiv Detail & Related papers (2021-10-14T17:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.