MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning
- URL: http://arxiv.org/abs/2006.00195v1
- Date: Sat, 30 May 2020 06:39:55 GMT
- Title: MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning
- Authors: Parvin Malekzadeh, Mohammad Salimibeni, Arash Mohammadi, Akbar Assa,
and Konstantinos N. Plataniotis
- Abstract summary: This paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework to learn optimal control policies.
An active learning method is proposed to enhance the sampling efficiency of the system.
Experimental results show superiority of the MM-KTD framework in comparison to its state-of-the-art counterparts.
- Score: 36.14516028564416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been an increasing surge of interest on development of advanced
Reinforcement Learning (RL) systems as intelligent approaches to learn optimal
control policies directly from smart agents' interactions with the environment.
Objectives: In a model-free RL method with continuous state-space, typically,
the value function of the states needs to be approximated. In this regard, Deep
Neural Networks (DNNs) provide an attractive modeling mechanism to approximate
the value function using sample transitions. DNN-based solutions, however,
suffer from high sensitivity to parameter selection, are prone to overfitting,
and are not very sample efficient. A Kalman-based methodology, on the other
hand, could be used as an efficient alternative. Such an approach, however,
commonly requires a-priori information about the system (such as noise
statistics) to perform efficiently. The main objective of this paper is to
address this issue. Methods: As a remedy to the aforementioned problems, this
paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD)
framework, which adapts the parameters of the filter using the observed states
and rewards. Moreover, an active learning method is proposed to enhance the
sampling efficiency of the system. More specifically, the estimated uncertainty
of the value functions are exploited to form the behaviour policy leading to
more visits to less certain values, therefore, improving the overall learning
sample efficiency. As a result, the proposed MM-KTD framework can learn the
optimal policy with significantly reduced number of samples as compared to its
DNN-based counterparts. Results: To evaluate performance of the proposed MM-KTD
framework, we have performed a comprehensive set of experiments based on three
RL benchmarks. Experimental results show superiority of the MM-KTD framework in
comparison to its state-of-the-art counterparts.
Related papers
- Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks.
Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios.
We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z) - Mean-AP Guided Reinforced Active Learning for Object Detection [31.304039641225504]
This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL)
MGRAL is a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks.
Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.
arXiv Detail & Related papers (2023-10-12T14:59:22Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Value Summation: A Novel Scoring Function for MPC-based Model-based
Reinforcement Learning [4.473327661758546]
This paper proposes a novel scoring function for the planning module of MPC-based reinforcement learning methods.
The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values.
The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.
arXiv Detail & Related papers (2022-09-16T20:52:39Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal
Difference and Successor Representation [32.80370188601152]
The paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR.
The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments.
arXiv Detail & Related papers (2021-12-30T18:21:53Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.