Related papers: Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

URL: http://arxiv.org/abs/2007.07582v1
Date: Wed, 15 Jul 2020 10:01:32 GMT
Title: Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Authors: Sabrina Hoppe and Marc Toussaint
Abstract summary: In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. We represent these transitions in a data graph and link its structure to soft divergence. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem.
Score: 33.31762612175859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in. The subgraph and its associated Q-values can be represented as a QGraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in temporal difference learning, our method QG-DDPG is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. QGraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm's sensitivity to the replay memory capacity.

Related papers

RSQ: Learning from Important Tokens Leads to Better Quantized LLMs [65.5558181902098]
Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. We propose RSQ (Rotate, Scale, then Quantize), which applies rotations to the model to mitigate outliers. We demonstrate that RSQ consistently outperforms baseline methods across multiple downstream tasks and three model families.
arXiv Detail & Related papers (2025-03-03T18:46:33Z)
Uncertainty Quantification in Retrieval Augmented Question Answering [57.05827081638329]
We propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.
arXiv Detail & Related papers (2025-02-25T11:24:52Z)
Linear Regression Using Quantum Annealing with Continuous Variables [0.0]
The boson system facilitates the optimization of linear regression without resorting to discrete approximations. The major benefit of our new approach is that it can ensure accuracy without increasing the number of qubits as long as the adiabatic condition is satisfied.
arXiv Detail & Related papers (2024-10-11T06:49:09Z)
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner. We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z)
Weight Re-Mapping for Variational Quantum Algorithms [54.854986762287126]
We introduce the concept of weight re-mapping for variational quantum circuits (VQCs) We employ seven distinct weight re-mapping functions to assess their impact on eight classification datasets. Our results indicate that weight re-mapping can enhance the convergence speed of the VQC.
arXiv Detail & Related papers (2023-06-09T09:42:21Z)
Improving Convergence for Quantum Variational Classifiers using Weight Re-Mapping [60.086820254217336]
In recent years, quantum machine learning has seen a substantial increase in the use of variational quantum circuits (VQCs) We introduce weight re-mapping for VQCs, to unambiguously map the weights to an interval of length $2pi$. We demonstrate that weight re-mapping increased test accuracy for the Wine dataset by $10%$ over using unmodified weights.
arXiv Detail & Related papers (2022-12-22T13:23:19Z)
Topological Experience Replay [22.84244156916668]
deep Q-learning methods update Q-values using state transitions sampled from the experience replay buffer. We organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks.
arXiv Detail & Related papers (2022-03-29T18:28:20Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
Characterizing the loss landscape of variational quantum circuits [77.34726150561087]
We introduce a way to compute the Hessian of the loss function of VQCs. We show how this information can be interpreted and compared to classical neural networks.
arXiv Detail & Related papers (2020-08-06T17:48:12Z)
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance. This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.