Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.07582v1
- Date: Wed, 15 Jul 2020 10:01:32 GMT
- Title: Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep
Reinforcement Learning
- Authors: Sabrina Hoppe and Marc Toussaint
- Abstract summary: In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates.
We represent these transitions in a data graph and link its structure to soft divergence.
We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem.
- Score: 33.31762612175859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In state of the art model-free off-policy deep reinforcement learning, a
replay memory is used to store past experience and derive all network updates.
Even if both state and action spaces are continuous, the replay memory only
holds a finite number of transitions. We represent these transitions in a data
graph and link its structure to soft divergence. By selecting a subgraph with a
favorable structure, we construct a simplified Markov Decision Process for
which exact Q-values can be computed efficiently as more data comes in. The
subgraph and its associated Q-values can be represented as a QGraph. We show
that the Q-value for each transition in the simplified MDP is a lower bound of
the Q-value for the same transition in the original continuous Q-learning
problem. By using these lower bounds in temporal difference learning, our
method QG-DDPG is less prone to soft divergence and exhibits increased sample
efficiency while being more robust to hyperparameters. QGraphs also retain
information from transitions that have already been overwritten in the replay
memory, which can decrease the algorithm's sensitivity to the replay memory
capacity.
Related papers
- Linear Regression Using Quantum Annealing with Continuous Variables [0.0]
The boson system facilitates the optimization of linear regression without resorting to discrete approximations.
The major benefit of our new approach is that it can ensure accuracy without increasing the number of qubits as long as the adiabatic condition is satisfied.
arXiv Detail & Related papers (2024-10-11T06:49:09Z) - CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner.
We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z) - Weight Re-Mapping for Variational Quantum Algorithms [54.854986762287126]
We introduce the concept of weight re-mapping for variational quantum circuits (VQCs)
We employ seven distinct weight re-mapping functions to assess their impact on eight classification datasets.
Our results indicate that weight re-mapping can enhance the convergence speed of the VQC.
arXiv Detail & Related papers (2023-06-09T09:42:21Z) - Improving Convergence for Quantum Variational Classifiers using Weight
Re-Mapping [60.086820254217336]
In recent years, quantum machine learning has seen a substantial increase in the use of variational quantum circuits (VQCs)
We introduce weight re-mapping for VQCs, to unambiguously map the weights to an interval of length $2pi$.
We demonstrate that weight re-mapping increased test accuracy for the Wine dataset by $10%$ over using unmodified weights.
arXiv Detail & Related papers (2022-12-22T13:23:19Z) - Topological Experience Replay [22.84244156916668]
deep Q-learning methods update Q-values using state transitions sampled from the experience replay buffer.
We organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states.
We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks.
arXiv Detail & Related papers (2022-03-29T18:28:20Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Characterizing the loss landscape of variational quantum circuits [77.34726150561087]
We introduce a way to compute the Hessian of the loss function of VQCs.
We show how this information can be interpreted and compared to classical neural networks.
arXiv Detail & Related papers (2020-08-06T17:48:12Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information.
We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.