Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries
- URL: http://arxiv.org/abs/2505.22299v2
- Date: Thu, 29 May 2025 04:07:47 GMT
- Title: Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries
- Authors: Ganlin Xu, Zhoujia Zhang, Wangyi Mei, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Xiaofeng Ma, Yanghua Xiao, Deqing Yang,
- Abstract summary: Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities.<n>We propose a neuro-symbolic information retrieval method, namely textbfNS-IR, to optimize the embeddings of naive natural language.<n>Our experiments demonstrate that NS-IR achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks.
- Score: 36.93438185371322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information retrieval plays a crucial role in resource localization. Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities, which compute similarities between dense vectors mainly depending on word co-occurrence between queries and documents, but overlook the real query intents. Thus, they often retrieve numerous irrelevant documents. Particularly in the scenarios of complex queries such as \emph{negative-constraint queries}, their retrieval performance could be catastrophic. To address the issue, we propose a neuro-symbolic information retrieval method, namely \textbf{NS-IR}, that leverages first-order logic (FOL) to optimize the embeddings of naive natural language by considering the \emph{logical consistency} between queries and documents. Specifically, we introduce two novel techniques, \emph{logic alignment} and \emph{connective constraint}, to rerank candidate documents, thereby enhancing retrieval relevance. Furthermore, we construct a new dataset \textbf{NegConstraint} including negative-constraint queries to evaluate our NS-IR's performance on such complex IR scenarios. Our extensive experiments demonstrate that NS-IR not only achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks, but also performs better on negative-constraint queries. Our scource code and dataset are available at https://github.com/xgl-git/NS-IR-main.
Related papers
- ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge [49.65993318863458]
ImpliRet is a benchmark that shifts the reasoning challenge to document-side processing.<n>We evaluate a range of sparse and dense retrievers, all of which struggle in this setting.
arXiv Detail & Related papers (2025-06-17T11:08:29Z) - ELITE: Embedding-Less retrieval with Iterative Text Exploration [5.8851517822935335]
Large Language Models (LLMs) have achieved impressive progress in natural language processing.<n>Their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks.
arXiv Detail & Related papers (2025-05-17T08:48:43Z) - Cognitive-Aligned Document Selection for Retrieval-augmented Generation [2.9060210098040855]
We propose GGatrieval to dynamically update queries and filter high-quality, reliable retrieval documents.<n>We parse the user query into its syntactic components and perform fine-grained grounded alignment with the retrieved documents.<n>Our approach introduces a novel criterion for filtering retrieved documents, closely emulating human strategies for acquiring targeted information.
arXiv Detail & Related papers (2025-02-17T13:00:15Z) - Constructing Set-Compositional and Negated Representations for First-Stage Ranking [23.123116796159717]
We introduce Disentangled Negation that penalizes only the negated parts of a query, and a Combined Pseudo-Term approach that enhances LSRs ability to handle intersections.<n>We find that our zero-shot approach is competitive and often outperforms retrievers fine-tuned on compositional data.
arXiv Detail & Related papers (2025-01-13T20:32:38Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
This paper introduces a novel method, $textbfGe$neration.<n>It improves the global document-Query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules.<n>When used as a retriever, GeAR does not incur any additional computational cost over bi-encoders.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.<n>Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.<n>We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.