Lookahead-Bounded Q-Learning
- URL: http://arxiv.org/abs/2006.15690v1
- Date: Sun, 28 Jun 2020 19:50:55 GMT
- Title: Lookahead-Bounded Q-Learning
- Authors: Ibrahim El Shar, Daniel R. Jiang
- Abstract summary: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning.
Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
- Score: 8.738692817482526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new,
provably convergent variant of Q-learning that seeks to improve the performance
of standard Q-learning in stochastic environments through the use of
``lookahead'' upper and lower bounds. To do this, LBQL employs previously
collected experience and each iteration's state-action values as dual feasible
penalties to construct a sequence of sampled information relaxation problems.
The solutions to these problems provide estimated upper and lower bounds on the
optimal value, which we track via stochastic approximation. These quantities
are then used to constrain the iterates to stay within the bounds at every
iteration. Numerical experiments on benchmark problems show that LBQL exhibits
faster convergence and more robustness to hyperparameters when compared to
standard Q-learning and several related techniques. Our approach is
particularly appealing in problems that require expensive simulations or
real-world interactions.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - Classical-to-Quantum Transfer Learning Facilitates Machine Learning with Variational Quantum Circuit [62.55763504085508]
We prove that a classical-to-quantum transfer learning architecture using a Variational Quantum Circuit (VQC) improves the representation and generalization (estimation error) capabilities of the VQC model.
We show that the architecture of classical-to-quantum transfer learning leverages pre-trained classical generative AI models, making it easier to find the optimal parameters for the VQC in the training stage.
arXiv Detail & Related papers (2023-05-18T03:08:18Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Accelerating Quadratic Optimization with Reinforcement Learning [39.64039435793601]
We show how Reinforcement Learning can learn a policy to tune parameters to accelerate convergence.
Our policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x.
RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications.
arXiv Detail & Related papers (2021-07-22T17:59:10Z) - Q-Match: Iterative Shape Matching via Quantum Annealing [64.74942589569596]
Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP)
This paper proposes Q-Match, a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm.
Q-Match can be applied for shape matching problems iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems.
arXiv Detail & Related papers (2021-05-06T17:59:38Z) - Ensemble Bootstrapping for Q-Learning [15.07549655582389]
We introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL)
EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables.
We show that there exist domains where both over and under-estimation result in sub-optimal performance.
arXiv Detail & Related papers (2021-02-28T10:19:47Z) - Single-partition adaptive Q-learning [0.0]
Single- Partition adaptive Q-learning (SPAQL) is an algorithm for model-free episodic reinforcement learning.
Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike adaptive Q-learning (AQL)
We claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.
arXiv Detail & Related papers (2020-07-14T00:03:25Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.