Related papers: DeText: A Deep Text Ranking Framework with BERT

DeText: A Deep Text Ranking Framework with BERT

URL: http://arxiv.org/abs/2008.02460v1
Date: Thu, 6 Aug 2020 05:12:11 GMT
Title: DeText: A Deep Text Ranking Framework with BERT
Authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, Ananth Sankar, Zimeng Yang, Qi Guo, Liang Zhang, Bo Long, Bee-Chung Chen and Deepak Agarwal
Abstract summary: In this paper, we investigate how to build an efficient BERT-based ranking model for industry use cases. The solution is further extended to a general ranking framework, DeText, that is open sourced and can be applied to various rankingproductions.
Score: 20.26046057139722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual embedding, which has been applied to capturecomplex query-document relations for search ranking. However,this is generally done by exhaustively interacting each query wordwith each document word, which is inefficient for online servingin search product systems. In this paper, we investigate how tobuild an efficient BERT-based ranking model for industry use cases.The solution is further extended to a general ranking framework,DeText, that is open sourced and can be applied to various rankingproductions. Offline and online experiments of DeText on threereal-world search systems present significant improvement overstate-of-the-art approaches.

Related papers

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track [22.81602641419962]
We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution. In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module. Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
arXiv Detail & Related papers (2023-08-23T09:56:59Z)
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search [12.701416688678622]
shallow model FastText is widely used for efficient online inference. BERT is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. We propose knowledge condensation to boost the classification performance of the online FastText model under strict low latency constraints.
arXiv Detail & Related papers (2023-08-02T12:05:01Z)
Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently. Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE. We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
Table Search Using a Deep Contextualized Language Model [20.041167804194707]
In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT.
arXiv Detail & Related papers (2020-05-19T04:18:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.