XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2509.25174v1
- Date: Mon, 29 Sep 2025 17:58:53 GMT
- Title: XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
- Authors: Daniel Palenicek, Florian Vogt, Joe Watson, Ingmar Posner, Jan Peters,
- Abstract summary: We introduce XQC: a well-motivated, sample-efficient deep actor-critic algorithm built upon soft actor-critic.<n>We achieve state-of-the-art sample efficiency across 55 proprioception and 15 vision-based continuous control tasks.
- Score: 26.063477716451512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sample efficiency is a central property of effective deep reinforcement learning algorithms. Recent work has improved this through added complexity, such as larger models, exotic network architectures, and more complex algorithms, which are typically motivated purely by empirical performance. We take a more principled approach by focusing on the optimization landscape of the critic network. Using the eigenspectrum and condition number of the critic's Hessian, we systematically investigate the impact of common architectural design decisions on training dynamics. Our analysis reveals that a novel combination of batch normalization (BN), weight normalization (WN), and a distributional cross-entropy (CE) loss produces condition numbers orders of magnitude smaller than baselines. This combination also naturally bounds gradient norms, a property critical for maintaining a stable effective learning rate under non-stationary targets and bootstrapping. Based on these insights, we introduce XQC: a well-motivated, sample-efficient deep actor-critic algorithm built upon soft actor-critic that embodies these optimization-aware principles. We achieve state-of-the-art sample efficiency across 55 proprioception and 15 vision-based continuous control tasks, all while using significantly fewer parameters than competing methods.
Related papers
- CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design [3.549422886703227]
CORE is a constraint-aware, one-step reinforcement learning method for simulationguided DSE.<n>We instantiate CORE for hardware-mapping co-design of neural network accelerators.
arXiv Detail & Related papers (2025-06-04T01:08:34Z) - Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Rethinking Value Function Learning for Generalization in Reinforcement
Learning [11.516147824168732]
We focus on the problem of training RL agents on multiple training environments to improve observational generalization performance.
We identify that the value network in the multiple-environment setting is more challenging to optimize and prone to overfitting training data than in the conventional single-environment setting.
We propose Delayed-Critic Policy Gradient (DCPG), which implicitly penalizes the value estimates by optimizing the value network less frequently with more training data than the policy network.
arXiv Detail & Related papers (2022-10-18T16:17:47Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Unsupervised Learning for Combinatorial Optimization with Principled
Objective Relaxation [19.582494782591386]
This work proposes an unsupervised learning framework for optimization (CO) problems.
Our key contribution is the observation that if the relaxed objective satisfies entry-wise concavity, a low optimization loss guarantees the quality of the final integral solutions.
In particular, this observation can guide the design of objective models in applications where the objectives are not given explicitly while requiring being modeled in prior.
arXiv Detail & Related papers (2022-07-13T06:44:17Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.