Related papers: Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track

Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track

URL: http://arxiv.org/abs/2308.12039v1
Date: Wed, 23 Aug 2023 09:56:59 GMT
Title: Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track
Authors: Guangwei Xu, Yangzhao Zhang, Longhui Zhang, Dingkun Long, Pengjun Xie, Ruijie Guo
Abstract summary: We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution. In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module. Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
Score: 22.81602641419962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale text retrieval technology has been widely used in various practical business scenarios. This paper presents our systems for the TREC 2022 Deep Learning Track. We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution. The retrieval stage combined the two structures of traditional sparse retrieval and neural dense retrieval. In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module to further enhance the final text ranking performance. Evaluation results demonstrate the effectiveness of our proposed approach. Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.

Related papers

Dense Passage Retrieval in Conversational Search [0.0]
We present a new method called dense retrieval, which uses a dual-encoder to create contextual embeddings that can be indexed and clustered efficiently at run-time. We propose an end-to-end conversational search system called GPT2QR+DPR, which incorporates various query reformulation strategies to improve retrieval accuracy. Our work contributes to the growing body of research on neural-based retrieval methods in conversational search, and highlights the potential of dense retrieval in improving retrieval accuracy in conversational search systems.
arXiv Detail & Related papers (2025-03-21T19:39:31Z)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics [0.7342677574855649]
The integration of new literature into the English curriculum remains a challenge since educators often lack scalable tools to rapidly evaluate readability and adapt texts for diverse classroom needs. This study proposes to address this gap through a multimodal approach that combines transformer-based text classification with linguistic feature analysis to align texts with UK Key Stages. The proposed approach is finally encapsulated in a stakeholder-facing web application, providing non-technical stakeholder access to real-time insights on text complexity, reading difficulty, curriculum alignment, and recommendations for learning age range.
arXiv Detail & Related papers (2024-11-26T17:01:27Z)
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval [67.50604814528553]
We first introduce a text encoder enhanced with RoPE and unpadding, pre-trained in a native 8192-token context. Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning.
arXiv Detail & Related papers (2024-07-29T03:12:28Z)
Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking [16.592276887533714]
Hybrid List Aware Transformer Reranking (HLATR) is a subsequent reranking module to incorporate both retrieval and reranking stage features. HLATR is lightweight and can be easily parallelized with existing text retrieval systems. Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
arXiv Detail & Related papers (2022-05-21T11:38:33Z)
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking [20.260222175405215]
This paper describes the PASH participation in TREC 2021 Deep Learning Track. In the recall stage, we adopt a scheme combining sparse and dense retrieval method. In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used.
arXiv Detail & Related papers (2022-05-18T04:38:15Z)
Curriculum Learning for Dense Retrieval Distillation [20.25741148622744]
We propose a generic curriculum learning based optimization framework called CL-DRD. CL-DRD controls the difficulty level of training data produced by the re-ranking (teacher) model. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2022-04-28T17:42:21Z)
Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
DeText: A Deep Text Ranking Framework with BERT [20.26046057139722]
In this paper, we investigate how to build an efficient BERT-based ranking model for industry use cases. The solution is further extended to a general ranking framework, DeText, that is open sourced and can be applied to various rankingproductions.
arXiv Detail & Related papers (2020-08-06T05:12:11Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years. We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.