Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep
Learning Track
- URL: http://arxiv.org/abs/2308.12039v1
- Date: Wed, 23 Aug 2023 09:56:59 GMT
- Title: Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep
Learning Track
- Authors: Guangwei Xu, Yangzhao Zhang, Longhui Zhang, Dingkun Long, Pengjun Xie,
Ruijie Guo
- Abstract summary: We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution.
In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module.
Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
- Score: 22.81602641419962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale text retrieval technology has been widely used in various
practical business scenarios. This paper presents our systems for the TREC 2022
Deep Learning Track. We explain the hybrid text retrieval and multi-stage text
ranking method adopted in our solution. The retrieval stage combined the two
structures of traditional sparse retrieval and neural dense retrieval. In the
ranking stage, in addition to the full interaction-based ranking model built on
large pre-trained language model, we also proposes a lightweight sub-ranking
module to further enhance the final text ranking performance. Evaluation
results demonstrate the effectiveness of our proposed approach. Our models
achieve the 1st and 4th rank on the test set of passage ranking and document
ranking respectively.
Related papers
- mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval [67.50604814528553]
We first introduce a text encoder enhanced with RoPE and unpadding, pre-trained in a native 8192-token context.
Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning.
arXiv Detail & Related papers (2024-07-29T03:12:28Z) - PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text
Retrieval Methods [0.552480439325792]
We present Polish Information Retrieval Benchmark (PIRB), a comprehensive evaluation framework encompassing 41 text information retrieval tasks for Polish.
The benchmark incorporates existing datasets as well as 10 new, previously unpublished datasets covering diverse topics such as medicine, law, business, physics, and linguistics.
We conduct an extensive evaluation of over 20 dense and sparse retrieval models, including the baseline models trained by us.
arXiv Detail & Related papers (2024-02-20T19:53:36Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware
Transformer Reranking [16.592276887533714]
Hybrid List Aware Transformer Reranking (HLATR) is a subsequent reranking module to incorporate both retrieval and reranking stage features.
HLATR is lightweight and can be easily parallelized with existing text retrieval systems.
Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
arXiv Detail & Related papers (2022-05-21T11:38:33Z) - PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking [20.260222175405215]
This paper describes the PASH participation in TREC 2021 Deep Learning Track.
In the recall stage, we adopt a scheme combining sparse and dense retrieval method.
In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used.
arXiv Detail & Related papers (2022-05-18T04:38:15Z) - Curriculum Learning for Dense Retrieval Distillation [20.25741148622744]
We propose a generic curriculum learning based optimization framework called CL-DRD.
CL-DRD controls the difficulty level of training data produced by the re-ranking (teacher) model.
Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2022-04-28T17:42:21Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - DeText: A Deep Text Ranking Framework with BERT [20.26046057139722]
In this paper, we investigate how to build an efficient BERT-based ranking model for industry use cases.
The solution is further extended to a general ranking framework, DeText, that is open sourced and can be applied to various rankingproductions.
arXiv Detail & Related papers (2020-08-06T05:12:11Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.