ComLQ: Benchmarking Complex Logical Queries in Information Retrieval
- URL: http://arxiv.org/abs/2511.12004v2
- Date: Sun, 23 Nov 2025 06:31:37 GMT
- Title: ComLQ: Benchmarking Complex Logical Queries in Information Retrieval
- Authors: Ganlin Xu, Zhitao Yin, Linghao Zhang, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Sihang Jiang, Deqing Yang,
- Abstract summary: Information retrieval systems play a critical role in navigating information overload across various applications.<n>These benchmarks can not be used to sufficiently evaluate the performance of IR models on complex queries in real-world scenarios.<n>We propose a novel method leveraging large language models (LLMs) to construct a new IR dataset textbfComLQ for textbfComplex textbfLogical textbfQueries.
- Score: 26.606215927237248
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Information retrieval (IR) systems play a critical role in navigating information overload across various applications. Existing IR benchmarks primarily focus on simple queries that are semantically analogous to single- and multi-hop relations, overlooking \emph{complex logical queries} involving first-order logic operations such as conjunction ($\land$), disjunction ($\lor$), and negation ($\lnot$). Thus, these benchmarks can not be used to sufficiently evaluate the performance of IR models on complex queries in real-world scenarios. To address this problem, we propose a novel method leveraging large language models (LLMs) to construct a new IR dataset \textbf{ComLQ} for \textbf{Com}plex \textbf{L}ogical \textbf{Q}ueries, which comprises 2,909 queries and 11,251 candidate passages. A key challenge in constructing the dataset lies in capturing the underlying logical structures within unstructured text. Therefore, by designing the subgraph-guided prompt with the subgraph indicator, an LLM (such as GPT-4o) is guided to generate queries with specific logical structures based on selected passages. All query-passage pairs in ComLQ are ensured \emph{structure conformity} and \emph{evidence distribution} through expert annotation. To better evaluate whether retrievers can handle queries with negation, we further propose a new evaluation metric, \textbf{Log-Scaled Negation Consistency} (\textbf{LSNC@$K$}). As a supplement to standard relevance-based metrics (such as nDCG and mAP), LSNC@$K$ measures whether top-$K$ retrieved passages violate negation conditions in queries. Our experimental results under zero-shot settings demonstrate existing retrieval models' limited performance on complex logical queries, especially on queries with negation, exposing their inferior capabilities of modeling exclusion.
Related papers
- SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables [13.249024309069236]
Table-Text question answering tasks require models that can reason across long text and source tables, traversing multiple hops and executing complex operations such as aggregation.<n>We present SPARTA, an end-to-end construction framework that automatically generates largescale Table-Text QA benchmarks with lightweight human validation.<n>On SPARTA, state-of-the-art models that reach over 70 F1 on HybridQA or over 50 F1 on OTT-QA drop by more than 30 F1 points.
arXiv Detail & Related papers (2026-02-26T17:59:51Z) - ROG: Retrieval-Augmented LLM Reasoning for Complex First-Order Queries over Knowledge Graphs [14.25887925588904]
We propose a retrieval-augmented framework that combines query-aware neighborhood retrieval with large language model (LLM) chain-of-thought reasoning.<n>ROG decomposes a multi-operator query into a sequence of single-operator sub-queries.<n> Intermediate answer sets are cached and reused across steps, improving consistency on deep reasoning chains.
arXiv Detail & Related papers (2026-02-02T17:45:43Z) - OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning [51.58235452818926]
We introduce OrLog, a neuro-symbolic retrieval framework that decouples predicate-level plausibility estimation from logical reasoning.<n>A large language model (LLM) provides plausibility scores for atomic predicates in one decoding-free forward pass, from which a probabilistic reasoning engine derives the posterior probability of query satisfaction.
arXiv Detail & Related papers (2026-01-30T15:31:58Z) - A Large Language Model Based Method for Complex Logical Reasoning over Knowledge Graphs [16.929901817693334]
Reasoning over knowledge graphs (KGs) with first-order logic (FOL) queries is challenging due to the inherent incompleteness of real-world KGs.<n>We propose ROG (Reasoning Over knowledge Graphs with large language models), an ensemble-style framework that combines query-aware KG neighborhood retrieval with large language model (LLM)-based chain-of-thought reasoning.
arXiv Detail & Related papers (2025-12-22T07:01:05Z) - KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z) - Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries [36.93438185371322]
Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities.<n>We propose a neuro-symbolic information retrieval method, namely textbfNS-IR, to optimize the embeddings of naive natural language.<n>Our experiments demonstrate that NS-IR achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks.
arXiv Detail & Related papers (2025-05-28T12:37:09Z) - Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z) - Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.<n>Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.<n>We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs [9.559336828884808]
Large Language Models (LLMs) are adept at answering simple (single-hop) questions.
As the complexity of the questions increase, the performance of LLMs degrades.
Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text.
We propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information.
arXiv Detail & Related papers (2024-06-10T05:22:49Z) - Prompt-fused framework for Inductive Logical Query Answering [31.736934787328156]
We propose a query-aware prompt-fused framework named Pro-QE.
We show that our model successfully handles the issue of unseen entities in logical queries.
arXiv Detail & Related papers (2024-03-19T11:30:30Z) - Reverse Engineering of Temporal Queries Mediated by LTL Ontologies [8.244587597395936]
In reverse engineering of database queries, we aim to construct a query from a given set of answers and non-answers.
We investigate this query-by-example problem for queries formulated in positive fragments of linear temporal logic over timestamped data.
arXiv Detail & Related papers (2023-05-02T08:27:39Z) - Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms.
The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.