Related papers: Formally Verified Approximate Policy Iteration

Formally Verified Approximate Policy Iteration

URL: http://arxiv.org/abs/2406.07340v1
Date: Tue, 11 Jun 2024 15:07:08 GMT
Title: Formally Verified Approximate Policy Iteration
Authors: Maximilian Schäffeler, Mohammad Abdulaziz,
Abstract summary: We show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions.
Score: 11.602089225841633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We formally verify an algorithm for approximate policy iteration on Factored Markov Decision Processes using the interactive theorem prover Isabelle/HOL. Next, we show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions. The algorithm builds on a diverse library of formalized mathematics and pushes existing methodologies for interactive theorem provers to the limits. We discuss the process of the verification project and the modifications to the algorithm needed for formal verification.

Related papers

DeclareAligner: A Leap Towards Efficient Optimal Alignments for Declarative Process Model Conformance Checking [1.4064491732635231]
This paper introduces DeclareAligner, a novel algorithm that uses the A* search algorithm, an established AI pathfinding technique, to tackle the problem. The proposed method is evaluated using 8,054 synthetic and real-life alignment problems, demonstrating its ability to efficiently compute optimal alignments.
arXiv Detail & Related papers (2025-03-13T15:49:29Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space. Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z)
Automatic Program Instrumentation for Automatic Verification (Extended Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic. We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches. We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z)
FastDiagP: An Algorithm for Parallelized Direct Diagnosis [64.65251961564606]
FastDiag is a typical direct diagnosis algorithm that supports diagnosis calculation without predetermining conflicts. We propose a novel algorithm, so-called FastDiagP, which is based on the idea of speculative programming. This mechanism helps to provide consistency checks with fast answers and boosts the algorithm's runtime performance.
arXiv Detail & Related papers (2023-05-11T16:26:23Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
Formally Verified Solution Methods for Infinite-Horizon Markov Decision Processes [7.538482310185133]
We build on existing formalizations of probability theory to analyze the expected total reward criterion on infinite-horizon problems. Our developments formalize the Bellman equation and give conditions under which optimal policies exist. We show that our system can compete with and even outperform state-of-the-art systems.
arXiv Detail & Related papers (2022-06-05T13:03:34Z)
Formal Semantics and Formally Verified Validation for Temporal Planning [7.538482310185133]
We present a simple and concise semantics for temporal planning. Our semantics are developed and formalised in the logic of the interactive theorem prover Isabelle/HOL. We derive from those semantics a validation algorithm for temporal planning and show, using a formal proof in Isabelle/HOL, that this validation algorithm implements our semantics.
arXiv Detail & Related papers (2022-03-25T12:04:03Z)
Counterfactual Explanations in Sequential Decision Making Under Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions. We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z)
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck. By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z)
Verification and Validation of Convex Optimization Algorithms for Model Predictive Control [1.5322124183968633]
This article discusses the formal verification of the Ellipsoid method, a convex optimization algorithm, and its code implementation. The applicability and limitations of those code properties and proofs are presented as well. Modifications to the algorithm are presented which can be used to control its numerical stability.
arXiv Detail & Related papers (2020-05-26T09:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.