Capturing P: On the Expressive Power and Efficient Evaluation of Boolean Retrieval
- URL: http://arxiv.org/abs/2601.18747v1
- Date: Mon, 26 Jan 2026 18:07:40 GMT
- Title: Capturing P: On the Expressive Power and Efficient Evaluation of Boolean Retrieval
- Authors: Amir Aavani,
- Abstract summary: Modern information retrieval is transitioning from simple document filtering to complex, neuro-symbolic reasoning.<n>We present a formal Retrieval Language ($mathcalL_R$) based on Directed Acyclic Graphs (DAGs) and prove it captures complexity class $mathbfP$.<n>This work establishes the theoretical foundation for turning the index into a general-purpose computational engine.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern information retrieval is transitioning from simple document filtering to complex, neuro-symbolic reasoning workflows. However, current retrieval architectures face a fundamental efficiency dilemma when handling the rigorous logical and arithmetic constraints required by this new paradigm. Standard iterator-based engines (Document-at-a-Time) do not natively support complex, nested logic graphs; forcing them to execute such queries typically results in intractable runtime performance. Conversely, naive recursive approaches (Term-at-a-Time), while capable of supporting these structures, suffer from prohibitive memory consumption when enforcing broad logical exclusions. In this paper, we propose that a retrieval engine must be capable of ``Capturing $\mathbf{P}$'' -- evaluating any polynomial-time property directly over its index in a computationally efficient manner. We define a formal Retrieval Language ($\mathcal{L}_R$) based on Directed Acyclic Graphs (DAGs) and prove it precisely captures the complexity class $\mathbf{P}$. We introduce \texttt{ComputePN}, a novel evaluation algorithm that makes $\mathcal{L}_R$ tractable. By combining native DAG traversal with a memory-efficient ``Positive-Negative'' response mechanism, \texttt{ComputePN} ensures the efficient evaluation of any query in $\mathcal{L}_R$. This work establishes the theoretical foundation for turning the search index into a general-purpose computational engine.
Related papers
- Rational Transductors [30.916600771560933]
We introduce emphRational Transductors, a dual-stream architecture that augments the Transformer with a matrix-valued recurrence.<n>By injecting rational state information into the attention mechanism via a emphDeep Rational Injection scheme, our framework strictly generalizes the expressive power of Transformers.
arXiv Detail & Related papers (2026-02-07T16:01:10Z) - Accelerated Evolving Set Processes for Local PageRank Computation [75.54334100808022]
This work proposes a novel framework based on nested evolving set processes to accelerate Personalized PageRank computation.<n>We show that the time complexity of such localized methods is upper bounded by $mintildemathcalO(R2/epsilon2), tildemathcalO(m)$ to obtain an $epsilon$-approximation of the PPR vector.
arXiv Detail & Related papers (2025-10-09T09:47:40Z) - You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures [16.867592142212203]
Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge.<n>Retrieval-augmented generation (RAG) addresses this by retrieving query-relevant contexts from knowledge bases to support LLM reasoning.<n>Existing Graph-based RAG methods rely on a costly process to transform the corpus into a graph, introducing overwhelming token cost and update latency.<n>We propose LogicRAG that dynamically extracts reasoning structures at inference time to guide adaptive retrieval without any pre-built graph.
arXiv Detail & Related papers (2025-08-08T08:07:40Z) - Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z) - Enhancing Retrieval Systems with Inference-Time Logical Reasoning [9.526027847179677]
We propose a novel inference-time logical reasoning framework that explicitly incorporates logical reasoning into the retrieval process.<n>Our method extracts logical reasoning structures from natural language queries and then composes the individual cosine similarity scores to formulate the final document scores.
arXiv Detail & Related papers (2025-03-22T20:40:18Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Synthesising Recursive Functions for First-Order Model Counting:
Challenges, Progress, and Conjectures [12.47276164048813]
First-order model counting (FOMC) is a computational problem that asks to count the models of a sentence in finite-domain first-order logic.
We relax the restrictions that typically accompany domain recursion to work with directed graphs that may contain cycles.
These improvements allow the algorithm to find efficient solutions to counting problems that were previously beyond its reach.
arXiv Detail & Related papers (2023-06-07T06:49:01Z) - Role Similarity Metric Based on Spanning Rooted Forest [25.35704965777052]
Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost.
We prove that textsfForestSim is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm.
The results show that textsfForestSim works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods.
arXiv Detail & Related papers (2021-10-15T05:50:10Z) - Mixability made efficient: Fast online multiclass logistic regression [68.8204255655161]
We show that mixability can be a powerful tool to obtain algorithms with optimal regret.
The resulting methods often suffer from high computational complexity which has reduced their practical applicability.
arXiv Detail & Related papers (2021-10-08T08:22:05Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Superposition for Lambda-Free Higher-Order Logic [62.997667081978825]
We introduce refutationally complete superposition calculi for intentional and extensional clausal $lambda$-free higher-order logic.
The calculi are parameterized by a term order that need not be fully monotonic, making it possible to employ the $lambda$-free higher-order lexicographic path and Knuth-Bendix orders.
arXiv Detail & Related papers (2020-05-05T12:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.