Related papers: RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification

RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification

URL: http://arxiv.org/abs/2510.26935v1
Date: Thu, 30 Oct 2025 18:46:34 GMT
Title: RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification
Authors: Yunhao Yang, Neel P. Bhatt, Pranay Samineni, Rohan Siva, Zhanyang Wang, Ufuk Topcu,
Abstract summary: We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable.<n>RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space.<n>RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space.
Score: 17.66826792670962
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-crafted temporal-logic specifications, offering limited expressiveness and accessibility. Deep learning approaches enable evaluation of plans against natural-language constraints, yet their opaque decision process invites misclassifications with potentially severe consequences. We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable. Starting from a modest seed set of plans labeled by an off-the-shelf model checker, RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space; a frozen linear boundary then verifies compliance for unseen natural-language rules in a single forward pass. Beyond binary classification, RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space. This guarantee enables a guarantee-driven refinement of the planner, improving rule compliance without human annotations. Empirical evaluations show that RepV improves compliance prediction accuracy by up to 15% compared to baseline methods while adding fewer than 0.2M parameters. Furthermore, our refinement framework outperforms ordinary fine-tuning baselines across various planning domains. These results show that safety-separable latent spaces offer a scalable, plug-and-play primitive for reliable neurosymbolic plan verification. Code and data are available at: https://repv-project.github.io/.

Related papers

Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI [6.5470636478144675]
Large Language Models (LLMs) show promise as planners for embodied AI, but their nature lacks formal reasoning.<n>We introduce the Verifiable Iterative Refinement Framework (VIRF), a neuro-symbolic architecture that shifts the paradigm from passive safety gatekeeping to active collaboration.
arXiv Detail & Related papers (2026-02-09T08:11:36Z)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z)
LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation [4.29885665563186]
LATENTGUARD is a framework that combines behavioral alignment with supervised latent space control for interpretable and precise safety steering.<n>Our results show significant improvements in both safety controllability and response interpretability without compromising utility.
arXiv Detail & Related papers (2025-09-24T07:31:54Z)
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective [56.31306558218838]
We show that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts.<n>We propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance.<n>Our experiments on the Llama families across multiple datasets demonstrate that safety problems can largely be avoided without specialized interventions.
arXiv Detail & Related papers (2025-08-17T23:46:36Z)
Core Safety Values for Provably Corrigible Agents [2.6451153531057985]
We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments.<n>Our framework replaces a single reward with five *structurally separate* utility heads.<n>For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable.
arXiv Detail & Related papers (2025-07-28T16:19:25Z)
Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces [59.80143393787701]
Large language models (LLMs) can handle uncertainty and promise to accelerate replanning while lowering the barrier to entry.<n>We introduce a neurosymbolic framework that pairs the accessibility of natural-language dialogue with verifiable guarantees on goal interpretation.<n>A lightweight model, fine-tuned on just 100 uncertainty-filtered examples, surpasses the zero-shot performance of GPT-4.1 while cutting inference latency by nearly 50%.
arXiv Detail & Related papers (2025-07-15T14:24:01Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling [74.41886258801209]
We propose a two-stage trajectory planning framework that decouples principle alignment from behavior learning.<n>Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-05-23T09:22:19Z)
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity [0.659529078336196]
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions.<n>LLMs hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios.<n>We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task.
arXiv Detail & Related papers (2024-02-09T16:40:59Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Planning with Learned Dynamics: Probabilistic Guarantees on Safety and Reachability via Lipschitz Constants [7.216586291939535]
We present a method for feedback motion planning of systems with unknown dynamics. We provide guarantees on safety, reachability, and goal stability. We demonstrate our approach by planning using learned models of a 6D quadrotor and a 7DOF Kuka arm.
arXiv Detail & Related papers (2020-10-18T14:17:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.