Formally Verified Approximate Policy Iteration
- URL: http://arxiv.org/abs/2406.07340v1
- Date: Tue, 11 Jun 2024 15:07:08 GMT
- Title: Formally Verified Approximate Policy Iteration
- Authors: Maximilian Schäffeler, Mohammad Abdulaziz,
- Abstract summary: We show how the formalized algorithm can be refined to an executable, verified implementation.
The implementation is evaluated on benchmark problems to show its practicability.
As part of the refinement, we develop verified software to certify Linear Programming solutions.
- Score: 11.602089225841633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We formally verify an algorithm for approximate policy iteration on Factored Markov Decision Processes using the interactive theorem prover Isabelle/HOL. Next, we show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions. The algorithm builds on a diverse library of formalized mathematics and pushes existing methodologies for interactive theorem provers to the limits. We discuss the process of the verification project and the modifications to the algorithm needed for formal verification.
Related papers
- Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - Automatic Program Instrumentation for Automatic Verification (Extended
Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic.
We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches.
We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z) - FastDiagP: An Algorithm for Parallelized Direct Diagnosis [64.65251961564606]
FastDiag is a typical direct diagnosis algorithm that supports diagnosis calculation without predetermining conflicts.
We propose a novel algorithm, so-called FastDiagP, which is based on the idea of speculative programming.
This mechanism helps to provide consistency checks with fast answers and boosts the algorithm's runtime performance.
arXiv Detail & Related papers (2023-05-11T16:26:23Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Formally Verified Solution Methods for Infinite-Horizon Markov Decision
Processes [7.538482310185133]
We build on existing formalizations of probability theory to analyze the expected total reward criterion on infinite-horizon problems.
Our developments formalize the Bellman equation and give conditions under which optimal policies exist.
We show that our system can compete with and even outperform state-of-the-art systems.
arXiv Detail & Related papers (2022-06-05T13:03:34Z) - Formal Semantics and Formally Verified Validation for Temporal Planning [7.538482310185133]
We present a simple and concise semantics for temporal planning.
Our semantics are developed and formalised in the logic of the interactive theorem prover Isabelle/HOL.
We derive from those semantics a validation algorithm for temporal planning and show, using a formal proof in Isabelle/HOL, that this validation algorithm implements our semantics.
arXiv Detail & Related papers (2022-03-25T12:04:03Z) - Counterfactual Explanations in Sequential Decision Making Under
Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes.
In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions.
We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z) - Verification and Validation of Convex Optimization Algorithms for Model
Predictive Control [1.5322124183968633]
This article discusses the formal verification of the Ellipsoid method, a convex optimization algorithm, and its code implementation.
The applicability and limitations of those code properties and proofs are presented as well.
Modifications to the algorithm are presented which can be used to control its numerical stability.
arXiv Detail & Related papers (2020-05-26T09:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.