Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning
- URL: http://arxiv.org/abs/2601.20014v1
- Date: Tue, 27 Jan 2026 19:41:10 GMT
- Title: Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning
- Authors: Shuhui Qu,
- Abstract summary: Inference-time planning with large language models frequently breaks under partial observability.<n>We introduce textbfSelf-Querying Bidirectional Categorical Planning (SQ-BCP), which explicitly represents precondition status.<n>We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements.
- Score: 1.8055130471307603
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Inference-time planning with large language models frequently breaks under partial observability: when task-critical preconditions are not specified at query time, models tend to hallucinate missing facts or produce plans that violate hard constraints. We introduce \textbf{Self-Querying Bidirectional Categorical Planning (SQ-BCP)}, which explicitly represents precondition status (\texttt{Sat}/\texttt{Viol}/\texttt{Unk}) and resolves unknowns via (i) targeted self-queries to an oracle/user or (ii) \emph{bridging} hypotheses that establish the missing condition through an additional action. SQ-BCP performs bidirectional search and invokes a pullback-based verifier as a categorical certificate of goal compatibility, while using distance-based scores only for ranking and pruning. We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements; under bounded branching and finite resolution depth, SQ-BCP finds an accepting plan when one exists. Across WikiHow and RecipeNLG tasks with withheld preconditions, SQ-BCP reduces resource-violation rates to \textbf{14.9\%} and \textbf{5.8\%} (vs.\ \textbf{26.0\%} and \textbf{15.7\%} for the best baseline), while maintaining competitive reference quality.
Related papers
- Fuzzy Categorical Planning: Autonomous Goal Satisfaction with Graded Semantic Constraints [1.8055130471307603]
Fuzzy Category-theoretic Planning (FCP)<n>FCP composes plan quality via a t-norm Lukasiewicz, and retains crisp executability checks via pullback verification.<n>We evaluate on (i) public PDDL3 preference/oversubscription benchmarks and (ii) RecipeNLG-Subs, a missing-substitute recipe-planning benchmark.
arXiv Detail & Related papers (2026-01-27T19:56:00Z) - KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z) - Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z) - Evaluating List Construction and Temporal Understanding capabilities of Large Language Models [54.39278049092508]
Large Language Models (LLMs) are susceptible to hallucinations and errors on particularly temporal understanding tasks.<n>We propose the Time referenced List based Question Answering (TLQA) benchmark that requires structured answers in list format aligned with corresponding time periods.<n>We investigate the temporal understanding and list construction capabilities of state-of-the-art generative models on TLQA in closed-book and open-domain settings.
arXiv Detail & Related papers (2025-06-26T21:40:58Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models [20.810300785340072]
Conformal Prediction with Query Oracle (CPQ) is a framework characterizing the optimal interplay between these objectives.<n>Our algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets.
arXiv Detail & Related papers (2025-06-05T18:26:14Z) - BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning [7.738546538164454]
A key issue in minimizing the overall makespan of data pipelines is the efficient scheduling of concurrent queries.<n>To our knowledge, BQSched is the first non-intrusive batch query scheduler via reinforcement learning.<n>Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling.
arXiv Detail & Related papers (2025-04-27T07:49:01Z) - Belief-State Query Policies for User-Aligned POMDPs [18.821166966365315]
We present a novel framework for expressing users' constraints and preferences about agent behavior in a partially observable setting.<n>We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant.<n>This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment.
arXiv Detail & Related papers (2024-05-24T20:04:51Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a new Query performance prediction (QPP) framework using automatically generated relevance judgments (QPP-GenRE)<n>QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.<n>We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific relevance.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - Conditional Self-Attention for Query-based Summarization [49.616774159367516]
We propose textitconditional self-attention (CSA), a neural network module designed for conditional dependency modeling.
Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer.
arXiv Detail & Related papers (2020-02-18T02:22:31Z) - The Limits of Efficiency for Open- and Closed-World Query Evaluation
Under Guarded TGDs [10.042878093985458]
Ontology-mediated querying and querying in the presence of constraints are two key database problems.
We study the limits of efficient query evaluation in the context of guarded and frontier-guarded TGDs and on UCQs as the actual queries.
arXiv Detail & Related papers (2019-12-28T11:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.