Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning
- URL: http://arxiv.org/abs/2501.10605v2
- Date: Fri, 07 Mar 2025 18:35:41 GMT
- Title: Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning
- Authors: Ali Baheri, Zahra Shahrooei, Chirayu Salgarkar,
- Abstract summary: We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE)<n>WAVE addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function.<n>We prove that WAVE achieves $mathcalOleft(frac1kright)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization.
- Score: 3.686808512438363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.
Related papers
- Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications [10.150648641677828]
Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI)<n>This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space.<n>A pivotal contribution of our analysis concerns the stability of the gradient estimator.
arXiv Detail & Related papers (2026-02-04T07:01:56Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic [7.536387580547838]
We argue that policy non-smoothness is governed by the differential geometry of the critic.<n>We introduce PAVE, a critic-centric regularization framework.<n>PAVE rectifies the learning signal by minimizing the Q-gradient volatility while preserving local curvature.
arXiv Detail & Related papers (2026-01-30T13:32:52Z) - On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization [57.179679246370114]
A potential limitation of existing methods is the bias inherent in most perturbation estimators unless a stepsize is proposed.<n>We propose a novel family of unbiased gradient scaling estimators that eliminate bias while maintaining favorable construction.
arXiv Detail & Related papers (2025-10-22T18:25:43Z) - Deterministic Coreset Construction via Adaptive Sensitivity Trimming [0.2864713389096699]
We develop a framework for deterministic coreset construction in empirical risk minimization.<n>Our central contribution is the Adaptive Deterministic Uniform-Weight Trimming (ADUWT) algorithm.<n>We conclude with open problems on instance-optimal oracles, deterministic streaming, and fairness-constrained ERM.
arXiv Detail & Related papers (2025-08-25T17:19:13Z) - Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation [1.8722948221596285]
We present the theoretical analysis on the adaptive ISTA with the thresholding strategy of estimating noise level by median absolute deviation.<n>We show properties of the fixed points of the algorithm, including scale equivariance, non-uniqueness, and local stability, prove the local linear convergence guarantee, and show its global convergence behavior.
arXiv Detail & Related papers (2025-07-02T18:41:59Z) - Efficient Adaptive Experimentation with Non-Compliance [39.43227019824619]
We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable.<n>We introduce AMRIV, an online policy that adaptively approximates the optimal allocation with (ii) a sequential, influence-function-based estimator that attains the semi-parametric efficiency bound while retaining multiplyrobust consistency.
arXiv Detail & Related papers (2025-05-23T04:49:14Z) - Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model [4.0148499400442095]
This paper deals with the identification of the derivation Ornstein-Uhlenbeck (OU) process error model.
We put forth a sample-efficient global optimization approach based on the Bayesian optimization framework.
arXiv Detail & Related papers (2025-03-09T01:38:21Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - Offline Reinforcement Learning with Fisher Divergence Critic
Regularization [41.085156836450466]
We propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy.
Behavior regularization then corresponds to an appropriate regularizer on the offset term.
Our algorithm Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.
arXiv Detail & Related papers (2021-03-14T22:11:40Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - How to Learn a Useful Critic? Model-based Action-Gradient-Estimator
Policy Optimization [10.424426548124696]
We propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients.
MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning.
We demonstrate the efficiency of the algorithm in comparison to model-free and model-based state-of-the-art baselines.
arXiv Detail & Related papers (2020-04-29T16:30:53Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.