Related papers: VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

URL: http://arxiv.org/abs/2502.17898v1
Date: Tue, 25 Feb 2025 06:53:00 GMT
Title: VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
Authors: Christine Lee, David Porfirio, Xinyu Jessica Wang, Kevin Zhao, Bilge Mutlu,
Abstract summary: This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning.<n>Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.
Score: 12.09421887596555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features -- a rule translator, flexibility sliders, and a model checker -- that engage users in the verification process. Through a user study (n=12), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

Related papers

PlanGenLLMs: A Modern Survey of LLM Planning Capabilities [12.322175348741435]
LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state. Many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks. Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap. It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency.
arXiv Detail & Related papers (2025-02-16T17:54:57Z)
Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making. Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance. We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z)
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming [13.246017517159043]
Large language models (LLMs) have recently demonstrated strong potential in solving planning problems. We propose LLpreview, a framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch. We apply LLpreview to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LL achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPTo and Claude 3.5 Sonnet.
arXiv Detail & Related papers (2024-10-15T23:20:54Z)
SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models [24.22168861692322]
We present three key insights, equivalence voting, constrained decoding, and domain-specific fine-tuning. Equivalence voting ensures consistency by generating and sampling multiple Linear Temporal Logic (LTL) formulas. Constrained decoding then uses the generated formula to enforce the autoregressive inference of plans. Domain-specific fine-tuning customizes LLMs to produce safe and efficient plans within specific task domains.
arXiv Detail & Related papers (2024-09-28T22:33:44Z)
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs) LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction. We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z)
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents [39.53593677934238]
Large Language Models (LLMs) enable AI Agents to automatically generate and execute multi-step plans to solve complex tasks. However, current LLM-based agents frequently generate invalid or non-executable plans. This paper proposes a novel "Formal-LLM" framework for LLM-based agents by integrating the expressiveness of natural language and the precision of formal language.
arXiv Detail & Related papers (2024-02-01T17:30:50Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning [59.543858889996024]
Large language models (LLMs) have revolutionized a large variety of NLP tasks.<n>We show how to leverage an LLM to automatically generate NL prompts from PDDL input.
arXiv Detail & Related papers (2023-11-16T11:55:27Z)
Learning to Plan with Natural Language [111.76828049344839]
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks. For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step. We propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.
arXiv Detail & Related papers (2023-04-20T17:09:12Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change [34.93870615625937]
PlanBench is a benchmark suite based on the kinds of domains used in the automated planning community. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
arXiv Detail & Related papers (2022-06-21T16:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.