Related papers: Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

URL: http://arxiv.org/abs/2411.19732v1
Date: Fri, 29 Nov 2024 14:25:54 GMT
Title: Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning
Authors: Severin Bochem, Eduardo Gonzalez-Sanchez, Yves Bicker, Gabriele Fadini,
Abstract summary: Differentiable simulators offer improved sample efficiency through exact gradients, but can be unstable in contact-rich environments.<n>This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms.
Score: 0.5399800035598186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning often requires extensive training data. Simulation-to-real transfer offers a promising approach to address this challenge in robotics. While differentiable simulators offer improved sample efficiency through exact gradients, they can be unstable in contact-rich environments and may lead to poor generalization. This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms. Our simulation results demonstrate that our method, tested on contact-rich environments, significantly enhances policy robustness to environmental variations and action perturbations while maintaining the sample efficiency of first-order methods. Specifically, our approach improves action noise tolerance compared to standard first-order methods and achieves generalization comparable to zeroth-order methods. This improvement stems from finding flatter minima in the loss landscape, associated with better generalization. Our work offers a promising solution to balance efficient learning and robust sim-to-real transfer in robotics, potentially bridging the gap between simulation and real-world performance.

Related papers

Maximum Total Correlation Reinforcement Learning [23.209609715886454]
We introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories.<n>In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories.
arXiv Detail & Related papers (2025-05-22T14:48:00Z)
Denoising-based Contractive Imitation Learning [1.3518297878940662]
Denoising mechanism enhances contraction properties of state transition mapping. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.
arXiv Detail & Related papers (2025-03-20T07:52:19Z)
Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning [8.92571113137362]
We address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. Unlike state-of-the-art (SOTA IL) methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting. We demonstrate consistently superior empirical performance compared to many SOTA IL algorithms.
arXiv Detail & Related papers (2024-08-17T07:17:19Z)
Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function. We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
DTC: Deep Tracking Control [16.2850135844455]
We propose a hybrid control architecture that combines the advantages of both worlds to achieve greater robustness, foot-placement accuracy, and terrain generalization. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts.
arXiv Detail & Related papers (2023-09-27T07:57:37Z)
DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control [0.0]
Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2023-06-15T10:11:38Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer [18.50206483493784]
We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions. During training, our gated fusion approach enables the prior to guide the initial stages of exploration. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
arXiv Detail & Related papers (2020-03-11T05:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.