A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval
- URL: http://arxiv.org/abs/2503.01003v1
- Date: Sun, 02 Mar 2025 19:59:41 GMT
- Title: A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval
- Authors: Dhairya Dalal, Sharmi Dev Gupta, Bentolhoda Binaei,
- Abstract summary: We present a unsupervised semantic search pipeline for the Causality-driven Adhoc Information Retrieval (CAIR-2021) shared task.<n>The CAIR shared task expands traditional information retrieval to support the retrieval of documents containing the likely causes of a query event.<n>A successful system must be able to distinguish between topical documents and documents containing causal descriptions of events that are causally related to the query event.
- Score: 1.1196974000738733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a unsupervised semantic search pipeline for the Causality-driven Adhoc Information Retrieval (CAIR-2021) shared task. The CAIR shared task expands traditional information retrieval to support the retrieval of documents containing the likely causes of a query event. A successful system must be able to distinguish between topical documents and documents containing causal descriptions of events that are causally related to the query event. Our approach involves aggregating results from multiple query strategies over a semantic and lexical index. The proposed approach leads the CAIR-2021 leaderboard and outperformed both traditional IR and pure semantic embedding-based approaches.
Related papers
- Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval [16.01726399448271]
SPIKE organizes documents into scenario, which encapsulates the reasoning process necessary to uncover implicit relationships between hypothetical information needs and document content.
During inference, SPIKE incorporates scenario-level relevance alongside document-level relevance, enabling reasoning-aware retrieval.
arXiv Detail & Related papers (2025-03-29T10:36:54Z) - QUIDS: Query Intent Generation via Dual Space Modeling [12.572815037915348]
We propose a dual-space model that uses semantic relevance and irrelevance information in the returned documents to explain the understanding of the query intent.
Our methods produce high-quality query intent descriptions, outperforming existing methods for this task, as well as state-of-the-art query-based summarization methods.
arXiv Detail & Related papers (2024-10-16T09:28:58Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.
We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - Event GDR: Event-Centric Generative Document Retrieval [37.53593254200252]
We propose Event GDR, an event-centric generative document retrieval model.
We employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation.
For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure.
arXiv Detail & Related papers (2024-05-11T02:55:11Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z) - Event-Driven Query Expansion [23.08079115356717]
We propose a method to expand an event-related query by first detecting the events related to it.
We derive the candidates for expansion as terms semantically related to both the query and the events.
We show that our proposed method of leveraging events improves query expansion performance significantly compared with state-of-the-art methods on various newswire TREC datasets.
arXiv Detail & Related papers (2020-12-22T14:56:54Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.