Fuzzy Categorical Planning: Autonomous Goal Satisfaction with Graded Semantic Constraints
- URL: http://arxiv.org/abs/2601.20021v1
- Date: Tue, 27 Jan 2026 19:56:00 GMT
- Title: Fuzzy Categorical Planning: Autonomous Goal Satisfaction with Graded Semantic Constraints
- Authors: Shuhui Qu,
- Abstract summary: Fuzzy Category-theoretic Planning (FCP)<n>FCP composes plan quality via a t-norm Lukasiewicz, and retains crisp executability checks via pullback verification.<n>We evaluate on (i) public PDDL3 preference/oversubscription benchmarks and (ii) RecipeNLG-Subs, a missing-substitute recipe-planning benchmark.
- Score: 1.8055130471307603
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Natural-language planning often involves vague predicates (e.g., suitable substitute, stable enough) whose satisfaction is inherently graded. Existing category-theoretic planners provide compositional structure and pullback-based hard-constraint verification, but treat applicability as crisp, forcing thresholding that collapses meaningful distinctions and cannot track quality degradation across multi-step plans. We propose Fuzzy Category-theoretic Planning (FCP), which annotates each action (morphism) with a degree in [0,1], composes plan quality via a t-norm Lukasiewicz, and retains crisp executability checks via pullback verification. FCP grounds graded applicability from language using an LLM with k-sample median aggregation and supports meeting-in-the-middle search using residuum-based backward requirements. We evaluate on (i) public PDDL3 preference/oversubscription benchmarks and (ii) RecipeNLG-Subs, a missing-substitute recipe-planning benchmark built from RecipeNLG with substitution candidates from Recipe1MSubs and FoodKG. FCP improves success and reduces hard-constraint violations on RecipeNLG-Subs compared to LLM-only and ReAct-style baselines, while remaining competitive with classical PDDL3 planners.
Related papers
- Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z) - Clipping-Free Policy Optimization for Large Language Models [30.663054788473598]
Reinforcement learning has become central to post-training large language models.<n> dominant algorithms rely on clipping mechanisms to introduce optimization issues at scale.<n>We propose Clipping-Free Policy Optimization, which replaces clipping with a convex penalty derived from Total Variation divergence constraints.
arXiv Detail & Related papers (2026-01-30T10:32:37Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering [52.69447404069251]
Large vision-language models (VLMs) have improved embodied question answering (EQA) agents by providing strong semantic priors for open-vocabulary reasoning.<n>We propose Prune-Then-Plan, a framework that stabilizes exploration through step-level calibration.
arXiv Detail & Related papers (2025-11-24T22:50:50Z) - Practical RAG Evaluation: A Rarity-Aware Set-Based Metric and Cost-Latency-Quality Trade-offs [0.0]
This paper addresses the guessing game in building production RAG.<n>There is no standardized, reproducible way to build and audit golden sets.<n>Rath-gs (MIT) is a lean golden-set pipeline with Plackett-Luce listwise refinement.
arXiv Detail & Related papers (2025-11-12T18:49:21Z) - Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs [1.2891210250935148]
This paper investigates automated skill decomposition using Large Language Models (LLMs)<n>Our framework standardizes the pipeline from prompting and generation to normalization and alignment with ontology nodes.<n>To evaluate outputs, we introduce two metrics: a F1-score that uses optimal embedding-based matching to assess content accuracy, and a hierarchy-aware F1-score that credits structurally correct placements to assess granularity.
arXiv Detail & Related papers (2025-10-13T12:03:06Z) - Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z) - Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization [13.97375970293678]
DPO (Direct Preference Optimization) has become a widely used offline preference optimization algorithm due to its simplicity and training stability.<n>We propose Linear Preference Optimization (LPO), a novel alignment framework featuring three key innovations.<n>First, we introduce gradient decoupling by replacing the log-sigmoid function with an absolute difference loss, thereby isolating the optimization dynamics.<n>Second, we improve stability through an offset constraint combined with a positive regularization term to preserve the chosen response quality.<n>Third, we implement controllable rejection suppression using gradient separation with straightforward estimation and a tunable coefficient that linearly regulates the descent of the rejection probability.
arXiv Detail & Related papers (2025-08-20T10:17:29Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.