Active Learning of Markov Decision Processes using Baum-Welch algorithm
(Extended)
- URL: http://arxiv.org/abs/2110.03014v1
- Date: Wed, 6 Oct 2021 18:54:19 GMT
- Title: Active Learning of Markov Decision Processes using Baum-Welch algorithm
(Extended)
- Authors: Giovanni Bacci, Anna Ing\'olfsd\'ottir, Kim Larsen, Rapha\"el
Reynouard
- Abstract summary: This paper revisits and adapts the classic Baum-Welch algorithm for learning Markov decision processes and Markov chains.
We empirically compare our approach with state-of-the-art tools and demonstrate that the proposed active learning procedure can significantly reduce the number of observations required to obtain accurate models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cyber-physical systems (CPSs) are naturally modelled as reactive systems with
nondeterministic and probabilistic dynamics. Model-based verification
techniques have proved effective in the deployment of safety-critical CPSs.
Central for a successful application of such techniques is the construction of
an accurate formal model for the system. Manual construction can be a
resource-demanding and error-prone process, thus motivating the design of
automata learning algorithms to synthesise a system model from observed system
behaviours.
This paper revisits and adapts the classic Baum-Welch algorithm for learning
Markov decision processes and Markov chains. For the case of MDPs, which
typically demand more observations, we present a model-based active learning
sampling strategy that choses examples which are most informative w.r.t.\ the
current model hypothesis. We empirically compare our approach with
state-of-the-art tools and demonstrate that the proposed active learning
procedure can significantly reduce the number of observations required to
obtain accurate models.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Model-based Policy Optimization using Symbolic World Model [46.42871544295734]
The application of learning-based control methods in robotics presents significant challenges.
One is that model-free reinforcement learning algorithms use observation data with low sample efficiency.
We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression.
arXiv Detail & Related papers (2024-07-18T13:49:21Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Partitioned Active Learning for Heterogeneous Systems [5.331649110169476]
We propose the partitioned active learning strategy established upon partitioned GP (PGP) modeling.
Global searching scheme accelerates the exploration aspect of active learning.
Local searching exploits the active learning criterion induced by the local GP model.
arXiv Detail & Related papers (2021-05-14T02:05:31Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - Prediction-Centric Learning of Independent Cascade Dynamics from Partial
Observations [13.680949377743392]
We address the problem of learning of a spreading model such that the predictions generated from this model are accurate.
We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach.
We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model.
arXiv Detail & Related papers (2020-07-13T17:58:21Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Online learning of both state and dynamics using ensemble Kalman filters [0.0]
This paper investigates the possibility to learn both the dynamics and the state online, i.e. to update their estimates at any time.
We consider the implication of learning dynamics online through (i) a global EnKF, (i) a local EnKF and (iii) an iterative EnKF.
We then demonstrate numerically the efficiency and assess the accuracy of these methods using one-dimensional, one-scale and two-scale chaotic Lorenz models.
arXiv Detail & Related papers (2020-06-06T13:19:26Z) - Model-based Multi-Agent Reinforcement Learning with Cooperative
Prioritized Sweeping [4.5497948012757865]
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping.
The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function.
Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
arXiv Detail & Related papers (2020-01-15T19:13:44Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.