Joint Verification and Refinement of Language Models for Safety-Constrained Planning
- URL: http://arxiv.org/abs/2410.14865v1
- Date: Fri, 18 Oct 2024 21:16:30 GMT
- Title: Joint Verification and Refinement of Language Models for Safety-Constrained Planning
- Authors: Yunhao Yang, William Ward, Zichao Hu, Joydeep Biswas, Ufuk Topcu,
- Abstract summary: We develop a method to generate executable plans and formally verify them against task-relevant safety specifications.
Given a high-level task description in natural language, the proposed method queries a language model to generate plans in the form of executable robot programs.
It then converts the generated plan into an automaton-based representation, allowing formal verification of the automaton against the specifications.
- Score: 21.95203475140736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although pre-trained language models can generate executable plans (e.g., programmatic policies) for solving robot tasks, the generated plans may violate task-relevant logical specifications due to the models' black-box nature. A significant gap remains between the language models' outputs and verifiable executions of plans. We develop a method to generate executable plans and formally verify them against task-relevant safety specifications. Given a high-level task description in natural language, the proposed method queries a language model to generate plans in the form of executable robot programs. It then converts the generated plan into an automaton-based representation, allowing formal verification of the automaton against the specifications. We prove that given a set of verified plans, the composition of these plans also satisfies the safety specifications. This proof ensures the safety of complex, multi-component plans, obviating the computation complexity of verifying the composed plan. We then propose an automated fine-tuning process that refines the language model to generate specification-compliant plans without the need for human labeling. The empirical results show a 30 percent improvement in the probability of generating plans that meet task specifications after fine-tuning.
Related papers
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent [80.83250816918861]
Large Reasoning Models (LRMs) like o3 and DeepSeek-R1 have achieved remarkable progress in natural language reasoning with long chain-of-thought.<n>However, they remain computationally inefficient and struggle with accuracy when solving problems requiring complex mathematical operations.<n>We present AgentMath, an agent framework that seamlessly integrates language models' reasoning capabilities with code interpreters' computational precision.
arXiv Detail & Related papers (2025-12-23T19:57:49Z) - Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study [0.0]
Large Language Models have sparked interest in their potential for robotic task planning.<n>While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain.<n>This paper presents a systematic evaluation of a broad spectrum of current state of the art language models.
arXiv Detail & Related papers (2025-07-31T14:25:54Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Towards High-Level Modelling in Automated Planning [0.36373653017571106]
Unified-Planning is a Python library offering high-level API to specify planning problems and to invoke automated planners.
In this paper, we present an extension of the UP library aimed at enhancing its expressivity for high-level problem modelling.
arXiv Detail & Related papers (2024-12-09T09:01:13Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs.
We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z) - Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification [15.686651364655958]
AutoSpec is an automated approach to synthesize specifications for automated program verification.
It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof.
It can be successfully applied to verify the programs in a real-world X509-parser project.
arXiv Detail & Related papers (2024-03-31T18:15:49Z) - Probabilistically Correct Language-based Multi-Robot Planning using Conformal Prediction [11.614036749291216]
We introduce a new distributed multi-robot planner called S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS.
We show that the proposed planner can achieve user-specified task success rates, assuming successful plan execution.
We provide comparative experiments against related works showing that our method is significantly more computational efficient and achieves lower help rates.
arXiv Detail & Related papers (2024-02-23T15:02:44Z) - Consolidating Trees of Robotic Plans Generated Using Large Language
Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability.
This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z) - Automated Process Planning Based on a Semantic Capability Model and SMT [50.76251195257306]
In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function.
We present an approach that combines these two topics: starting from a semantic capability model, an AI planning problem is automatically generated.
arXiv Detail & Related papers (2023-12-14T10:37:34Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Fine-Tuning Language Models Using Formal Methods Feedback [53.24085794087253]
We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems.
The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions.
The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
arXiv Detail & Related papers (2023-10-27T16:24:24Z) - The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification [3.2925005312612323]
This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated C programs with vulnerability classification.
Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name.
We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program.
arXiv Detail & Related papers (2023-07-05T10:39:58Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Multimodal Contextualized Plan Prediction for Embodied Task Completion [9.659463406886301]
Task planning is an important component of traditional robotics systems enabling robots to compose fine grained skills to perform more complex tasks.
Recent work building systems for translating natural language to executable actions for task completion in simulated embodied agents is focused on directly predicting low level action sequences.
We focus on predicting a higher level plan representation for one such embodied task completion dataset - TEACh.
arXiv Detail & Related papers (2023-05-10T22:29:12Z) - Synthesis of Mathematical programs from Natural Language Specifications [0.0]
Decision problems that are encountered in various business domains can be modeled as mathematical programs, i.e. optimization problems.
The process of conducting such modeling often requires the involvement of experts trained in operations research and advanced algorithms.
In this work we evaluate the efficacy of employing CodeT5 with data augmentation and post-processing of beams.
We observe that with these enhancements CodeT5 base gives an execution accuracy of 0.73 which is significantly better than zero-shot execution accuracy of 0.41 by ChatGPT and 0.36 by Codex.
arXiv Detail & Related papers (2023-03-30T06:10:00Z) - Generating Sequences by Learning to Self-Correct [64.0249217590888]
Self-Correction decouples an imperfect base generator from a separate corrector that learns to iteratively correct imperfect generations.
We show that Self-Correction improves upon the base generator in three diverse generation tasks.
arXiv Detail & Related papers (2022-10-31T18:09:51Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Actions You Can Handle: Dependent Types for AI Plans [2.064612766965483]
This paper proposes a methodology for embedding plans produced by AI planners into dependently-typed language Agda.
It enables users to reason about and verify more general and abstract properties of plans.
arXiv Detail & Related papers (2021-05-24T13:33:56Z) - Generating Adversarial Computer Programs using Optimized Obfuscations [43.95037234252815]
We investigate principled ways to adversarially perturb a computer program to fool such learned models.
We use program obfuscations, which have conventionally been used to avoid attempts at reverse engineering programs.
We show that our best attack proposal achieves a $52%$ improvement over a state-of-the-art attack generation approach.
arXiv Detail & Related papers (2021-03-18T10:47:15Z) - From Abstractions to Grounded Languages for Robust Coordination of Task
Planning Robots [4.496989927037321]
We study the automatic construction of languages that are maximally flexible while being sufficiently explicative for coordination.
Our language expresses a plan for any given task as a "plan sketch" to convey just-enough details while maximizing the flexibility to realize it.
arXiv Detail & Related papers (2019-05-01T22:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.