Event-driven Real-time Retrieval in Web Search
- URL: http://arxiv.org/abs/2312.00372v2
- Date: Mon, 4 Dec 2023 11:42:35 GMT
- Title: Event-driven Real-time Retrieval in Web Search
- Authors: Nan Yang, Shusen Zhang, Yannan Zhang, Xiaoling Bai, Hualong Deng,
Tianhua Zhou and Jin Ma
- Abstract summary: This paper expands the query with event information that represents real-time search intent.
We further enhance the model's capacity for event representation through multi-task training.
Our proposed approach significantly outperforms existing state-of-the-art baseline methods.
- Score: 15.235255100530496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information retrieval in real-time search presents unique challenges distinct
from those encountered in classical web search. These challenges are
particularly pronounced due to the rapid change of user search intent, which is
influenced by the occurrence and evolution of breaking news events, such as
earthquakes, elections, and wars. Previous dense retrieval methods, which
primarily focused on static semantic representation, lack the capacity to
capture immediate search intent, leading to inferior performance in retrieving
the most recent event-related documents in time-sensitive scenarios. To address
this issue, this paper expands the query with event information that represents
real-time search intent. The Event information is then integrated with the
query through a cross-attention mechanism, resulting in a time-context query
representation. We further enhance the model's capacity for event
representation through multi-task training. Since publicly available datasets
such as MS-MARCO do not contain any event information on the query side and
have few time-sensitive queries, we design an automatic data collection and
annotation pipeline to address this issue, which includes ModelZoo-based Coarse
Annotation and LLM-driven Fine Annotation processes. In addition, we share the
training tricks such as two-stage training and hard negative sampling. Finally,
we conduct a set of offline experiments on a million-scale production dataset
to evaluate our approach and deploy an A/B testing in a real online system to
verify the performance. Extensive experimental results demonstrate that our
proposed approach significantly outperforms existing state-of-the-art baseline
methods.
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Event-enhanced Retrieval in Real-time Search [5.720930457681116]
Existing embedding-based retrieval models often face the "semantic drift" problem and insufficient focus on key information.
This paper proposes a novel approach called EER, which enhances real-time retrieval performance by improving the dual-encoder model.
We believe that this approach will provide new perspectives in the field of information retrieval.
arXiv Detail & Related papers (2024-04-09T03:47:48Z) - Exploring the Limits of Historical Information for Temporal Knowledge
Graph Extrapolation [59.417443739208146]
We propose a new event forecasting model based on a novel training framework of historical contrastive learning.
CENET learns both the historical and non-historical dependency to distinguish the most potential entities.
We evaluate our proposed model on five benchmark graphs.
arXiv Detail & Related papers (2023-08-29T03:26:38Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - RETE: Retrieval-Enhanced Temporal Event Forecasting on Unified Query
Product Evolutionary Graph [18.826901341496143]
Temporal event forecasting is a new user behavior prediction task in a unified query product evolutionary graph.
We propose a novel RetrievalEnhanced Event forecasting framework.
Unlike existing methods, we propose methods that enhance user representations via roughly connected entities in the whole graph.
arXiv Detail & Related papers (2022-02-12T19:27:56Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval [3.6526118822907594]
We show that it is not trivial to pick features responding well to a panel of variations and semantic content.
Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
arXiv Detail & Related papers (2021-03-19T10:54:51Z) - Event-Driven Query Expansion [23.08079115356717]
We propose a method to expand an event-related query by first detecting the events related to it.
We derive the candidates for expansion as terms semantically related to both the query and the events.
We show that our proposed method of leveraging events improves query expansion performance significantly compared with state-of-the-art methods on various newswire TREC datasets.
arXiv Detail & Related papers (2020-12-22T14:56:54Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.