Related papers: Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning

Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning

URL: http://arxiv.org/abs/2601.20014v1
Date: Tue, 27 Jan 2026 19:41:10 GMT
Title: Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning
Authors: Shuhui Qu,
Abstract summary: Inference-time planning with large language models frequently breaks under partial observability.<n>We introduce textbfSelf-Querying Bidirectional Categorical Planning (SQ-BCP), which explicitly represents precondition status.<n>We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements.
Score: 1.8055130471307603
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Inference-time planning with large language models frequently breaks under partial observability: when task-critical preconditions are not specified at query time, models tend to hallucinate missing facts or produce plans that violate hard constraints. We introduce \textbf{Self-Querying Bidirectional Categorical Planning (SQ-BCP)}, which explicitly represents precondition status (\texttt{Sat}/\texttt{Viol}/\texttt{Unk}) and resolves unknowns via (i) targeted self-queries to an oracle/user or (ii) \emph{bridging} hypotheses that establish the missing condition through an additional action. SQ-BCP performs bidirectional search and invokes a pullback-based verifier as a categorical certificate of goal compatibility, while using distance-based scores only for ranking and pruning. We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements; under bounded branching and finite resolution depth, SQ-BCP finds an accepting plan when one exists. Across WikiHow and RecipeNLG tasks with withheld preconditions, SQ-BCP reduces resource-violation rates to \textbf{14.9\%} and \textbf{5.8\%} (vs.\ \textbf{26.0\%} and \textbf{15.7\%} for the best baseline), while maintaining competitive reference quality.

Related papers

Fuzzy Categorical Planning: Autonomous Goal Satisfaction with Graded Semantic Constraints [1.8055130471307603]
Fuzzy Category-theoretic Planning (FCP)<n>FCP composes plan quality via a t-norm Lukasiewicz, and retains crisp executability checks via pullback verification.<n>We evaluate on (i) public PDDL3 preference/oversubscription benchmarks and (ii) RecipeNLG-Subs, a missing-substitute recipe-planning benchmark.
arXiv Detail & Related papers (2026-01-27T19:56:00Z)
KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z)
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z)
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models [54.39278049092508]
Large Language Models (LLMs) are susceptible to hallucinations and errors on particularly temporal understanding tasks.<n>We propose the Time referenced List based Question Answering (TLQA) benchmark that requires structured answers in list format aligned with corresponding time periods.<n>We investigate the temporal understanding and list construction capabilities of state-of-the-art generative models on TLQA in closed-book and open-domain settings.
arXiv Detail & Related papers (2025-06-26T21:40:58Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models [20.810300785340072]
Conformal Prediction with Query Oracle (CPQ) is a framework characterizing the optimal interplay between these objectives.<n>Our algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets.
arXiv Detail & Related papers (2025-06-05T18:26:14Z)
BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning [7.738546538164454]
A key issue in minimizing the overall makespan of data pipelines is the efficient scheduling of concurrent queries.<n>To our knowledge, BQSched is the first non-intrusive batch query scheduler via reinforcement learning.<n>Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling.
arXiv Detail & Related papers (2025-04-27T07:49:01Z)
Belief-State Query Policies for User-Aligned POMDPs [18.821166966365315]
We present a novel framework for expressing users' constraints and preferences about agent behavior in a partially observable setting.<n>We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant.<n>This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment.
arXiv Detail & Related papers (2024-05-24T20:04:51Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a new Query performance prediction (QPP) framework using automatically generated relevance judgments (QPP-GenRE)<n>QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.<n>We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific relevance.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem. We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
Conditional Self-Attention for Query-based Summarization [49.616774159367516]
We propose textitconditional self-attention (CSA), a neural network module designed for conditional dependency modeling. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer.
arXiv Detail & Related papers (2020-02-18T02:22:31Z)
The Limits of Efficiency for Open- and Closed-World Query Evaluation Under Guarded TGDs [10.042878093985458]
Ontology-mediated querying and querying in the presence of constraints are two key database problems. We study the limits of efficient query evaluation in the context of guarded and frontier-guarded TGDs and on UCQs as the actual queries.
arXiv Detail & Related papers (2019-12-28T11:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.