Related papers: Lifting the Veil: Unlocking the Power of Depth in Q-learning

Lifting the Veil: Unlocking the Power of Depth in Q-learning

URL: http://arxiv.org/abs/2310.17915v1
Date: Fri, 27 Oct 2023 06:15:33 GMT
Title: Lifting the Veil: Unlocking the Power of Depth in Q-learning
Authors: Shao-Bo Lin, Tao Li, Shaojie Tang, Yao Wang, Ding-Xuan Zhou
Abstract summary: deep Q-learning has been widely used in operations research and management science. This paper theoretically verifies the power of depth in deep Q-learning.
Score: 31.700583180829106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the help of massive data and rich computational resources, deep Q-learning has been widely used in operations research and management science and has contributed to great success in numerous applications, including recommender systems, supply chains, games, and robotic manipulation. However, the success of deep Q-learning lacks solid theoretical verification and interpretability. The aim of this paper is to theoretically verify the power of depth in deep Q-learning. Within the framework of statistical learning theory, we rigorously prove that deep Q-learning outperforms its traditional version by demonstrating its good generalization error bound. Our results reveal that the main reason for the success of deep Q-learning is the excellent performance of deep neural networks (deep nets) in capturing the special properties of rewards namely, spatial sparseness and piecewise constancy, rather than their large capacities. In this paper, we make fundamental contributions to the field of reinforcement learning by answering to the following three questions: Why does deep Q-learning perform so well? When does deep Q-learning perform better than traditional Q-learning? How many samples are required to achieve a specific prediction accuracy for deep Q-learning? Our theoretical assertions are verified by applying deep Q-learning in the well-known beer game in supply chain management and a simulated recommender system.

Related papers

KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows. Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems. We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z)
Quantum Imitation Learning [74.15588381240795]
We propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL. We develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL) Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts.
arXiv Detail & Related papers (2023-04-04T12:47:35Z)
Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains. We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT) Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z)
Prerequisite-driven Q-matrix Refinement for Learner Knowledge Assessment: A Case Study in Online Learning Context [2.221779410386775]
We propose a prerequisite-driven Q-matrix refinement framework for learner knowledge assessment (PQRLKA) in online context. We infer the prerequisites from learners' response data and use it to refine the expert-defined Q-matrix. Based on the refined Q-matrix, we propose a Metapath2Vec enhanced convolutional representation method to obtain the comprehensive representations of the items.
arXiv Detail & Related papers (2022-08-24T08:44:08Z)
Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing [9.078553427792183]
We propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than DQN.
arXiv Detail & Related papers (2022-05-14T05:50:54Z)
Deep Reinforcement Learning with Spiking Q-learning [51.386945803485084]
spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (RL)
arXiv Detail & Related papers (2022-01-21T16:42:11Z)
Transferability in Deep Learning: A Survey [80.67296873915176]
The ability to acquire and reuse knowledge is known as transferability in deep learning. We present this survey to connect different isolated areas in deep learning with their relation to transferability. We implement a benchmark and an open-source library, enabling a fair evaluation of deep learning methods in terms of transferability.
arXiv Detail & Related papers (2022-01-15T15:03:17Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
An Elementary Proof that Q-learning Converges Almost Surely [1.52292571922932]
Watkins' and Dayan's Q-learning is a model-free reinforcement learning algorithm. This paper reproduces a precise and (nearly) self-contained proof that Q-learning converges.
arXiv Detail & Related papers (2021-08-05T19:32:26Z)
Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples [8.938418994111716]
Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning. An offline expert assesses the value of a state in a coarse manner using three discrete values. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias.
arXiv Detail & Related papers (2021-06-28T12:41:45Z)
Deep Q-Learning: Theoretical Insights from an Asymptotic Analysis [3.9871041399267613]
Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network to approximate the well-known Q-function. Although wildly successful under laboratory conditions, serious gaps between theory and practice as well as a lack of formal guarantees prevent its use in the real world. We provide a theoretical analysis of a popular version of Deep Q-Learning under realistic verifiable assumptions.
arXiv Detail & Related papers (2020-08-25T07:59:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.