HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware
Transformer Reranking
- URL: http://arxiv.org/abs/2205.10569v1
- Date: Sat, 21 May 2022 11:38:33 GMT
- Title: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware
Transformer Reranking
- Authors: Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie
- Abstract summary: Hybrid List Aware Transformer Reranking (HLATR) is a subsequent reranking module to incorporate both retrieval and reranking stage features.
HLATR is lightweight and can be easily parallelized with existing text retrieval systems.
Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
- Score: 16.592276887533714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep pre-trained language models (e,g. BERT) are effective at large-scale
text retrieval task. Existing text retrieval systems with state-of-the-art
performance usually adopt a retrieve-then-reranking architecture due to the
high computational cost of pre-trained language models and the large corpus
size. Under such a multi-stage architecture, previous studies mainly focused on
optimizing single stage of the framework thus improving the overall retrieval
performance. However, how to directly couple multi-stage features for
optimization has not been well studied. In this paper, we design Hybrid List
Aware Transformer Reranking (HLATR) as a subsequent reranking module to
incorporate both retrieval and reranking stage features. HLATR is lightweight
and can be easily parallelized with existing text retrieval systems so that the
reranking process can be performed in a single yet efficient processing.
Empirical experiments on two large-scale text retrieval datasets show that
HLATR can efficiently improve the ranking performance of existing multi-stage
text retrieval methods.
Related papers
- CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - Contextualization with SPLADE for High Recall Retrieval [5.973857434357868]
High Recall Retrieval (HRR) is a search problem that optimize the cost of retrieving most relevant documents in a given collection.
In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors.
It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%.
arXiv Detail & Related papers (2024-05-07T03:05:37Z) - Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep
Learning Track [22.81602641419962]
We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution.
In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module.
Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
arXiv Detail & Related papers (2023-08-23T09:56:59Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z) - Text Simplification by Tagging [21.952293614293392]
We present TST, a simple and efficient Text Simplification system based on sequence Tagging.
Our system makes simplistic data augmentations and tweaks in training and inference on a pre-existing system.
It achieves faster inference speeds by over 11 times than the current state-of-the-art text simplification system.
arXiv Detail & Related papers (2021-03-08T20:57:55Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.