Related papers: Probabilistic Constrained Reinforcement Learning with Formal Interpretability

Probabilistic Constrained Reinforcement Learning with Formal Interpretability

URL: http://arxiv.org/abs/2307.07084v4
Date: Mon, 17 Jun 2024 12:56:53 GMT
Title: Probabilistic Constrained Reinforcement Learning with Formal Interpretability
Authors: Yanran Wang, Qiuchen Qian, David Boyle,
Abstract summary: We propose a novel Adaptive Wasserstein Variational Optimization, namely AWaVO, to tackle these interpretability challenges. Our approach uses formal methods to achieve the interpretability for convergence guarantee, training transparency, and intrinsic decision-interpretation. In comparison with state-of-theart benchmarks including TRPO-IPO, PCPO and CRPO, we empirically verify that AWaVO offers a reasonable trade-off between high performance and sufficient interpretability.
Score: 2.990411348977783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and the corresponding optimal policy. Consequently, representing sequential decision-making problems as probabilistic inference can have considerable value, as, in principle, the inference offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of policy optimization. In this study, we propose a novel Adaptive Wasserstein Variational Optimization, namely AWaVO, to tackle these interpretability challenges. Our approach uses formal methods to achieve the interpretability for convergence guarantee, training transparency, and intrinsic decision-interpretation. To demonstrate its practicality, we showcase guaranteed interpretability with an optimal global convergence rate in simulation and in practical quadrotor tasks. In comparison with state-of-the-art benchmarks including TRPO-IPO, PCPO and CRPO, we empirically verify that AWaVO offers a reasonable trade-off between high performance and sufficient interpretability.

Related papers

CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z)
Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm [0.22099217573031676]
This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert executing a task.<n> Experiments show that the method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, and with abilities to generalize.
arXiv Detail & Related papers (2025-07-09T12:11:27Z)
Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
Probabilistic Subspace Manifolds for Contextual Inference in Large Language Models [0.0]
Representing token embeddings as probability distributions allows for more flexible contextual inference. Probability embeddings improve neighborhood consistency and decrease redundancy. Probability embeddings preserve contextual integrity even under robustness-based evaluation scenarios.
arXiv Detail & Related papers (2025-02-07T21:32:32Z)
Prediction-Powered E-Values [0.66567375919026]
We apply ideas of prediction-powered inference to e-values. We show that every inference procedure that can be framed in terms of e-values has a prediction-powered counterpart. Our approach is modular and easily integrable into existing algorithms.
arXiv Detail & Related papers (2025-02-06T18:36:01Z)
Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning [16.10753846850319]
Continual learning allows models to persistently acquire and retain information. catastrophic forgetting can severely impair model performance. We introduce a novel framework termed Optimally-Weighted Mean Discrepancy (OWMMD), which imposes penalties on representation alterations.
arXiv Detail & Related papers (2025-01-21T13:33:45Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose. In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning [20.48276559928517]
We introduce a straightforward architecture that constrains the discrete policy to be unimodal using Poisson probability distributions. We conduct experiments to show that the discrete policy with the unimodal probability distribution provides significantly faster convergence and higher performance for on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2024-08-01T06:06:53Z)
An Efficient Approach for Solving Expensive Constrained Multiobjective Optimization Problems [0.0]
An efficient probabilistic selection based constrained multi-objective EA is proposed, referred to as PSCMOEA. It comprises novel elements such as (a) an adaptive search bound identification scheme based on the feasibility and convergence status of evaluated solutions. Numerical experiments are conducted on an extensive range of challenging constrained problems using low evaluation budgets to simulate ECMOPs.
arXiv Detail & Related papers (2024-05-22T02:32:58Z)
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL) Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function. We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Probabilistic Constraint for Safety-Critical Reinforcement Learning [13.502008069967552]
We consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL) We provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE. We propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies.
arXiv Detail & Related papers (2023-06-29T19:41:56Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities. We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z)
Bounded Robustness in Reinforcement Learning via Lexicographic Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost. We study how policies can be maximally robust to arbitrary observational noise. We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z)
Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions. We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel. We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
Scalable Uncertainty for Computer Vision with Functional Variational Inference [18.492485304537134]
We leverage the formulation of variational inference in function space. We obtain predictive uncertainty estimates at the cost of a single forward pass through any chosen CNN architecture. We propose numerically efficient algorithms which enable fast training in the context of high-dimensional tasks.
arXiv Detail & Related papers (2020-03-06T19:09:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.