Related papers: Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning

Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning

URL: http://arxiv.org/abs/2105.08990v1
Date: Wed, 19 May 2021 08:36:26 GMT
Title: Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning
Authors: Maximilian Schenke and Oliver Wallscheid
Abstract summary: Reinforcement learning (RL) is a popular research topic in control engineering. RL controllers are trained in direct interaction with the controlled system, rendering them data-driven and performance-oriented solutions. DESSCA is a kernel density estimation-based state-space coverage acceleration.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) is currently a popular research topic in control engineering and has the potential to make its way to industrial and commercial applications. Corresponding RL controllers are trained in direct interaction with the controlled system, rendering them data-driven and performance-oriented solutions. The best practice of exploring starts (ES) is used by default to support the learning process via randomly picked initial states. However, this method might deliver strongly biased results if the system's dynamic and constraints lead to unfavorable sample distributions in the state space (e.g., condensed sample accumulation in certain state-space areas). To overcome this issue, a kernel density estimation-based state-space coverage acceleration (DESSCA) is proposed, which improves the ES concept by prioritizing infrequently visited states for a more balanced coverage of the state space during training. Considered test scenarios are mountain car, cartpole and electric motor control environments. Using DQN and DDPG as exemplary RL algorithms, it can be shown that DESSCA is a simple yet effective algorithmic extension to the established ES approach.

Related papers

Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows [3.7960472831772765]
We propose a data-assimilated model-based RL (DA-MBRL) framework for systems with partial observability and noisy measurements. An off-policy actor-critic algorithm is employed to learn optimal control strategies from state estimates. The framework is tested on the Kuramoto-Sivainskysh equation, demonstrating its effectiveness in stabilizing atemporally chaotic flow.
arXiv Detail & Related papers (2025-04-23T10:12:53Z)
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space [0.0]
In this paper, we introduce a prioritized form of a combination of state-of-the-art approaches to outperform the earlier results for continuous state and action space problems. Our experiments also involve the use of parameter noise during training resulting in more robust deep RL models.
arXiv Detail & Related papers (2024-10-15T04:12:12Z)
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States [52.56827348431552]
gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
arXiv Detail & Related papers (2024-02-12T18:41:31Z)
Laboratory Experiments of Model-based Reinforcement Learning for Adaptive Optics Control [0.565395466029518]
We implement and adapt an RL method called Policy Optimization for AO (PO4AO) to the GHOST test bench at ESO headquarters. We study the predictive and self-calibrating aspects of the method. New implementation on GHOST running PyTorch introduces only around 700 microseconds in addition to hardware, pipeline, and Python interface latency.
arXiv Detail & Related papers (2023-12-30T14:11:43Z)
Neural Episodic Control with State Abstraction [38.95199070504417]
Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. This work introduces Neural Episodic Control with State Abstraction (NECSA) We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains.
arXiv Detail & Related papers (2023-01-27T01:55:05Z)
On the Effective Usage of Priors in RSS-based Localization [56.68864078417909]
We propose a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet. In this paper, we study the localization problem in dense urban settings. We first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these.
arXiv Detail & Related papers (2022-11-28T00:31:02Z)
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Reinforcement Learning for Low-Thrust Trajectory Design of Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances. An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted. The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z)
Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications. We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space. By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.