Effective and Efficient Query-aware Snippet Extraction for Web Search
- URL: http://arxiv.org/abs/2210.08809v1
- Date: Mon, 17 Oct 2022 07:46:17 GMT
- Title: Effective and Efficient Query-aware Snippet Extraction for Web Search
- Authors: Jingwei Yi, Fangzhao Wu, Chuhan Wu, Xiaolong Huang, Binxing Jiao,
Guangzhong Sun, Xing Xie
- Abstract summary: We propose an effective query-aware webpage snippet extraction method named DeepQSE.
DeepQSE first learns query-aware sentence representations for each sentence to capture the fine-grained relevance between query and sentence.
We propose an efficient version of DeepQSE, named Efficient-DeepQSE, which can significantly improve the inference speed of DeepQSE without affecting its performance.
- Score: 61.60405035952961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query-aware webpage snippet extraction is widely used in search engines to
help users better understand the content of the returned webpages before
clicking. Although important, it is very rarely studied. In this paper, we
propose an effective query-aware webpage snippet extraction method named
DeepQSE, aiming to select a few sentences which can best summarize the webpage
content in the context of input query. DeepQSE first learns query-aware
sentence representations for each sentence to capture the fine-grained
relevance between query and sentence, and then learns document-aware
query-sentence relevance representations for snippet extraction. Since the
query and each sentence are jointly modeled in DeepQSE, its online inference
may be slow. Thus, we further propose an efficient version of DeepQSE, named
Efficient-DeepQSE, which can significantly improve the inference speed of
DeepQSE without affecting its performance. The core idea of Efficient-DeepQSE
is to decompose the query-aware snippet extraction task into two stages, i.e.,
a coarse-grained candidate sentence selection stage where sentence
representations can be cached, and a fine-grained relevance modeling stage.
Experiments on two real-world datasets validate the effectiveness and
efficiency of our methods.
Related papers
- Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search [32.35446999027349]
We leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model.
The proposed model -- Query Representation Alignment Conversational Retriever, QRACDR, is tested on eight datasets.
arXiv Detail & Related papers (2024-07-29T17:14:36Z) - Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources [6.109188517569139]
ProQE is a progressive query expansion algorithm that iteratively expands the query as it retrieves more documents.
Our results show that ProQE outperforms state-of-the-art baselines by 37% and is the most cost-effective.
arXiv Detail & Related papers (2024-06-11T10:30:19Z) - User Intent Recognition and Semantic Cache Optimization-Based Query Processing Framework using CFLIS and MGR-LAU [0.0]
This work analyzed the informational, navigational, and transactional-based intents in queries for enhanced QP.
For efficient QP, the data is structured using Epanechnikov Kernel-Ordering Points To Identify the Clustering Structure (EK-OPTICS)
The extracted features, detected intents and structured data are inputted to the Multi-head Gated Recurrent Learnable Attention Unit (MGR-LAU)
arXiv Detail & Related papers (2024-06-06T20:28:05Z) - Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations [76.70349332096693]
Information-seeking dialogue systems are widely used in e-commerce systems.
We propose a Query-bag based Pseudo Relevance Feedback framework (QB-PRF)
It constructs a query-bag with related queries to serve as pseudo signals to guide information-seeking conversations.
arXiv Detail & Related papers (2024-03-22T08:10:32Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Query-Response Interactions by Multi-tasks in Semantic Search for
Chatbot Candidate Retrieval [12.615150401073711]
We propose a novel approach, called Multitask-based Semantic Search Neural Network (MSSNN) for candidate retrieval.
The method employs a Seq2Seq modeling task to learn a good query encoder, and then performs a word prediction task to build response embeddings, finally conducts a simple matching model to form the dot-product scorer.
arXiv Detail & Related papers (2022-08-23T15:07:35Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.