Related papers: Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

URL: http://arxiv.org/abs/2306.16750v2
Date: Wed, 8 Nov 2023 09:38:35 GMT
Title: Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Authors: Qiang He and Tianyi Zhou and Meng Fang and Setareh Maghsudi
Abstract summary: Eigensubspace Regularized Critic (ERC) is motivated by an analysis of dynamics of Q-value approximation error. We show that ERC effectively reduces the variance of value functions. It shows significant advantages in Q-value approximation and variance reduction.
Score: 44.663181739694664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.

Related papers

TRSVR: An Adaptive Stochastic Trust-Region Method with Variance Reduction [17.083793956698994]
We propose a trust method for unconstrained non-reduced optimization that incorporates variance-region (SVRG) to accelerate convergence.<n>The proposed algorithm relies solely on gradient information and does not require function value evaluations.
arXiv Detail & Related papers (2026-01-21T04:41:57Z)
Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z)
Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z)
Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains. We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
Conformal Risk Minimization with Variance Reduction [37.74931189657469]
Conformal prediction (CP) is a distribution-free framework for achieving probabilistic guarantees on black-box models. Recent research efforts have focused on optimizing CP efficiency during training. We formalize this concept as the problem of conformal risk minimization.
arXiv Detail & Related papers (2024-11-03T21:48:15Z)
Fast Value Tracking for Deep Reinforcement Learning [7.648784748888187]
Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interact with their environment. Existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards. Our research leverages the Kalman paradigm to introduce a novel quantification and sampling algorithm called Langevinized Kalman TemporalTD.
arXiv Detail & Related papers (2024-03-19T22:18:19Z)
Critic-Guided Decision Transformer for Offline Reinforcement Learning [28.211835303617118]
Critic-Guided Decision Transformer (CGDT) Uses predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. Builds upon these insights, we propose a novel approach, which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer.
arXiv Detail & Related papers (2023-12-21T10:29:17Z)
Normality-Guided Distributional Reinforcement Learning for Continuous Control [16.324313304691426]
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite close to normal. We propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function.
arXiv Detail & Related papers (2022-08-28T02:52:10Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem. We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory. Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z)
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP. DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces [53.47210316424326]
KeRNS is an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes. We prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time.
arXiv Detail & Related papers (2020-07-09T21:37:13Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.