Improved Exploring Starts by Kernel Density Estimation-Based State-Space
Coverage Acceleration in Reinforcement Learning
- URL: http://arxiv.org/abs/2105.08990v1
- Date: Wed, 19 May 2021 08:36:26 GMT
- Title: Improved Exploring Starts by Kernel Density Estimation-Based State-Space
Coverage Acceleration in Reinforcement Learning
- Authors: Maximilian Schenke and Oliver Wallscheid
- Abstract summary: Reinforcement learning (RL) is a popular research topic in control engineering.
RL controllers are trained in direct interaction with the controlled system, rendering them data-driven and performance-oriented solutions.
DESSCA is a kernel density estimation-based state-space coverage acceleration.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is currently a popular research topic in control
engineering and has the potential to make its way to industrial and commercial
applications. Corresponding RL controllers are trained in direct interaction
with the controlled system, rendering them data-driven and performance-oriented
solutions. The best practice of exploring starts (ES) is used by default to
support the learning process via randomly picked initial states. However, this
method might deliver strongly biased results if the system's dynamic and
constraints lead to unfavorable sample distributions in the state space (e.g.,
condensed sample accumulation in certain state-space areas). To overcome this
issue, a kernel density estimation-based state-space coverage acceleration
(DESSCA) is proposed, which improves the ES concept by prioritizing
infrequently visited states for a more balanced coverage of the state space
during training. Considered test scenarios are mountain car, cartpole and
electric motor control environments. Using DQN and DDPG as exemplary RL
algorithms, it can be shown that DESSCA is a simple yet effective algorithmic
extension to the established ES approach.
Related papers
- Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space [0.0]
In this paper, we introduce a prioritized form of a combination of state-of-the-art approaches to outperform the earlier results for continuous state and action space problems.
Our experiments also involve the use of parameter noise during training resulting in more robust deep RL models.
arXiv Detail & Related papers (2024-10-15T04:12:12Z) - Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States [52.56827348431552]
gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data.
This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
arXiv Detail & Related papers (2024-02-12T18:41:31Z) - Laboratory Experiments of Model-based Reinforcement Learning for
Adaptive Optics Control [0.565395466029518]
We implement and adapt an RL method called Policy Optimization for AO (PO4AO) to the GHOST test bench at ESO headquarters.
We study the predictive and self-calibrating aspects of the method.
New implementation on GHOST running PyTorch introduces only around 700 microseconds in addition to hardware, pipeline, and Python interface latency.
arXiv Detail & Related papers (2023-12-30T14:11:43Z) - Neural Episodic Control with State Abstraction [38.95199070504417]
Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency.
This work introduces Neural Episodic Control with State Abstraction (NECSA)
We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains.
arXiv Detail & Related papers (2023-01-27T01:55:05Z) - On the Effective Usage of Priors in RSS-based Localization [56.68864078417909]
We propose a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet.
In this paper, we study the localization problem in dense urban settings.
We first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these.
arXiv Detail & Related papers (2022-11-28T00:31:02Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Reinforcement Learning for Low-Thrust Trajectory Design of
Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances.
An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted.
The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z) - Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications.
We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space.
By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.