Related papers: Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

URL: http://arxiv.org/abs/2405.17784v2
Date: Mon, 3 Jun 2024 20:23:49 GMT
Title: Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
Authors: Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg,
Abstract summary: This paper introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.
Score: 36.308936312224404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.

Related papers

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning [39.56217775141507]
Low-rAnk Regulated Gradient Projection (LARGO) algorithm integrates dynamic constraints into low-rank adaptation methods.<n>LARGO achieves state-of-the-art performance across in-domain and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-06-14T08:19:11Z)
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning [50.856589224454055]
Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs)<n>We propose regularized policy gradient (RPG), a framework for deriving and analyzing KL-regularized policy gradient methods in the online reinforcement learning setting.<n>RPG shows improved or competitive results in terms of training stability and performance compared to strong baselines such as GRPO, REINFORCE++, and DAPO.
arXiv Detail & Related papers (2025-05-23T06:01:21Z)
Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization [0.0]
Multi-fidelity Reinforcement Learning (RL) frameworks efficiently utilize computational resources by integrating analysis models of varying accuracy and costs. This work proposes a novel adaptive multi-fidelity RL framework, in which multiple heterogeneous, non-hierarchical low-fidelity models are dynamically leveraged alongside a high-fidelity model. The effectiveness of the approach is demonstrated in an octocopter design optimization problem, utilizing two low-fidelity models alongside a high-fidelity simulator.
arXiv Detail & Related papers (2025-03-23T22:29:08Z)
ROCM: RLHF on consistency models [8.905375742101707]
We propose a reward optimization framework for applying RLHF to consistency models. We investigate various $f$-divergences as regularization strategies, striking a balance between reward and model consistency.
arXiv Detail & Related papers (2025-03-08T11:19:48Z)
Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives. We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method. We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z)
Actor-Critic Reinforcement Learning with Phased Actor [10.577516871906816]
We propose a novel phased actor in actor-critic (PAAC) method to improve policy gradient estimation. PAAC accounts for both $Q$ value and TD error in its actor update. Results show that PAAC leads to significant performance improvement measured by total cost, learning variance, robustness, learning speed and success rate.
arXiv Detail & Related papers (2024-04-18T01:27:31Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Sim-to-Real Transfer of Adaptive Control Parameters for AUV Stabilization under Current Disturbance [1.099532646524593]
This paper presents a novel approach, merging the Maximum Entropy Deep Reinforcement Learning framework with a classic model-based control architecture, to formulate an adaptive controller. Within this framework, we introduce a Sim-to-Real transfer strategy comprising the following components: a bio-inspired experience replay mechanism, an enhanced domain randomisation technique, and an evaluation protocol executed on a physical platform. Our experimental assessments demonstrate that this method effectively learns proficient policies from suboptimal simulated models of the AUV, resulting in control performance 3 times higher when transferred to a real-world vehicle.
arXiv Detail & Related papers (2023-10-17T08:46:56Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.