Interpretable Option Discovery using Deep Q-Learning and Variational
Autoencoders
- URL: http://arxiv.org/abs/2210.01231v1
- Date: Mon, 3 Oct 2022 21:08:39 GMT
- Title: Interpretable Option Discovery using Deep Q-Learning and Variational
Autoencoders
- Authors: Per-Arne Andersen and Ole-Christoffer Granmo and Morten Goodwin
- Abstract summary: The DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning.
Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow.
- Score: 9.432068833600884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (RL) is unquestionably a robust framework to
train autonomous agents in a wide variety of disciplines. However, traditional
deep and shallow model-free RL algorithms suffer from low sample efficiency and
inadequate generalization for sparse state spaces. The options framework with
temporal abstractions is perhaps the most promising method to solve these
problems, but it still has noticeable shortcomings. It only guarantees local
convergence, and it is challenging to automate initiation and termination
conditions, which in practice are commonly hand-crafted.
Our proposal, the Deep Variational Q-Network (DVQN), combines deep
generative- and reinforcement learning. The algorithm finds good policies from
a Gaussian distributed latent-space, which is especially useful for defining
options. The DVQN algorithm uses MSE with KL-divergence as regularization,
combined with traditional Q-Learning updates. The algorithm learns a
latent-space that represents good policies with state clusters for options. We
show that the DVQN algorithm is a promising approach for identifying initiation
and termination conditions for option-based reinforcement learning. Experiments
show that the DVQN algorithm, with automatic initiation and termination, has
comparable performance to Rainbow and can maintain stability when trained for
extended periods after convergence.
Related papers
- Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning [0.0]
Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, has two major limitations.
We propose a sample-based, model-free double SOR Q-learning algorithm.
The proposed algorithm is extended to large-scale problems using deep RL.
arXiv Detail & Related papers (2024-09-10T09:23:03Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Sequential Knockoffs for Variable Selection in Reinforcement Learning [19.925653053430395]
We introduce the notion of a minimal sufficient state in a Markov decision process (MDP)
We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics.
arXiv Detail & Related papers (2023-03-24T21:39:06Z) - Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.