Accelerating Quadratic Optimization with Reinforcement Learning
- URL: http://arxiv.org/abs/2107.10847v1
- Date: Thu, 22 Jul 2021 17:59:10 GMT
- Title: Accelerating Quadratic Optimization with Reinforcement Learning
- Authors: Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac,
Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg
- Abstract summary: We show how Reinforcement Learning can learn a policy to tune parameters to accelerate convergence.
Our policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x.
RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications.
- Score: 39.64039435793601
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: First-order methods for quadratic optimization such as OSQP are widely used
for large-scale machine learning and embedded optimal control, where many
related problems must be rapidly solved. These methods face two persistent
challenges: manual hyperparameter tuning and convergence time to high-accuracy
solutions. To address these, we explore how Reinforcement Learning (RL) can
learn a policy to tune parameters to accelerate convergence. In experiments
with well-known QP benchmarks we find that our RL policy, RLQP, significantly
outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes
surprisingly well to previously unseen problems with varying dimension and
structure from different applications, including the QPLIB, Netlib LP and
Maros-Meszaros problems. Code for RLQP is available at
https://github.com/berkeleyautomation/rlqp.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.
AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z) - SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning [89.04776523010409]
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.
In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping.
We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
arXiv Detail & Related papers (2024-05-24T20:30:14Z) - VQC-Based Reinforcement Learning with Data Re-uploading: Performance and
Trainability [0.0]
Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision.
Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks.
It is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms.
arXiv Detail & Related papers (2024-01-21T18:00:15Z) - Accelerate Presolve in Large-Scale Linear Programming via Reinforcement
Learning [92.31528918811007]
We propose a simple and efficient reinforcement learning framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously.
Experiments on two solvers and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs.
arXiv Detail & Related papers (2023-10-18T09:51:59Z) - Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications.
Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z) - Reinforcement Learning to Solve NP-hard Problems: an Application to the
CVRP [0.0]
We evaluate the use of Reinforcement Learning (RL) to solve a classic optimization problem.
We compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances.
We find that despite not returning the best solution, the RL approach has many advantages over traditional solvers.
arXiv Detail & Related papers (2022-01-14T11:16:17Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Lookahead-Bounded Q-Learning [8.738692817482526]
We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning.
Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
arXiv Detail & Related papers (2020-06-28T19:50:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.