Related papers: Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

URL: http://arxiv.org/abs/2501.14443v1
Date: Fri, 24 Jan 2025 12:23:12 GMT
Title: Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent
Authors: Lucía Güitta-López, Jaime Boal, Álvaro J. López-López,
Abstract summary: This paper analyzes the robustness of a state-of-the-art sim-to-real technique known as progressive neural networks (PNNs)<n> Randomizing certain variables during simulation-based training significantly mitigates this issue.<n>The increase in the model's accuracy is around 25% when diversity is introduced in the training process.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The industrial application of Deep Reinforcement Learning (DRL) is frequently slowed down because of the inability to generate the experience required to train the models. Collecting data often involves considerable time and economic effort that is unaffordable in most cases. Fortunately, devices like robots can be trained with synthetic experience thanks to virtual environments. With this approach, the sample efficiency problems of artificial agents are mitigated, but another issue arises: the need for efficiently transferring the synthetic experience into the real world (sim-to-real). This paper analyzes the robustness of a state-of-the-art sim-to-real technique known as progressive neural networks (PNNs) and studies how adding diversity to the synthetic experience can complement it. To better understand the drivers that lead to a lack of robustness, the robotic agent is still tested in a virtual environment to ensure total control on the divergence between the simulated and real models. The results show that a PNN-like agent exhibits a substantial decrease in its robustness at the beginning of the real training phase. Randomizing certain variables during simulation-based training significantly mitigates this issue. On average, the increase in the model's accuracy is around 25% when diversity is introduced in the training process. This improvement can be translated into a decrease in the required real experience for the same final robustness performance. Notwithstanding, adding real experience to agents should still be beneficial regardless of the quality of the virtual experience fed into the agent.

Related papers

ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation [72.78362530982109]
ARTIS, Agentic Risk-Aware Test-Time Scaling via Iterative Simulation, is a framework that decouples exploration from commitment.<n>We show that naive LLM-based simulators struggle to capture rare but high-impact failure modes.<n>We introduce a risk-aware tool simulator that emphasizes fidelity on failure-inducing actions.
arXiv Detail & Related papers (2026-02-02T06:33:22Z)
Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation [1.0499611180329804]
Deep Reinforcement Learning (DRL) compromises its industrial adoption due to the high cost and time demands of real-world training.<n>Virtual environments offer a cost-effective alternative for training DRL agents, but the transfer of learned policies to real setups is hindered by the sim-to-real gap.<n>This work proposes a novel domain adaptation approach relying on a Style-Identified Cycle Consistent Generative Adversarial Network (StyleID-CycleGAN or SICGAN), an original Cycle Consistent Generative Adversarial Network (CycleGAN) based model.
arXiv Detail & Related papers (2026-01-23T11:48:15Z)
Scaling Agent Learning via Experience Synthesis [100.42712232390532]
Reinforcement learning can empower autonomous agents by enabling self-improvement through interaction.<n>But its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity.<n>We introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind.
arXiv Detail & Related papers (2025-11-05T18:58:48Z)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z)
Is Diversity All You Need for Scalable Robotic Manipulation? [50.747150672933316]
We investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better"<n>We show that task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios.<n>We propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data.
arXiv Detail & Related papers (2025-07-08T17:52:44Z)
Simulation as Reality? The Effectiveness of LLM-Generated Data in Open-ended Question Assessment [7.695222586877482]
This study investigates the potential and gap of simulative data to address the limitation of AI-based assessment tools. Our findings reveal that while simulative data demonstrates promising results in training automated assessment models, its effectiveness has notable limitations. The absence of real-world noise and biases, which are also present in over-processed real-world data, contributes to this limitation.
arXiv Detail & Related papers (2025-02-10T11:40:11Z)
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
We introduce a novel framework for learning world models.<n>By providing a scalable and robust framework, we pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z)
VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation [0.0]
This work introduces VIRL, a Volume-Informed Representation Learning approach to pre-train a 3D geometric encoder. The model pre-trained by VIRL shows substantial enhancements on demonstrating improved generalizability with limited data.
arXiv Detail & Related papers (2024-06-18T05:30:26Z)
Enhancing Activity Recognition After Stroke: Generative Adversarial Networks for Kinematic Data Augmentation [0.0]
Generalizability of machine learning models for wearable monitoring in stroke rehabilitation is often constrained by the limited scale and heterogeneity of available data. Data augmentation addresses this challenge by adding computationally derived data to real data to enrich the variability represented in the training set. This study employs Conditional Generative Adversarial Networks (cGANs) to create synthetic kinematic data from a publicly available dataset. By training deep learning models on both synthetic and experimental data, we enhanced task classification accuracy: models incorporating synthetic data attained an overall accuracy of 80.0%, significantly higher than the 66.1% seen in models trained solely with real data
arXiv Detail & Related papers (2024-06-12T15:51:00Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks. We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration. Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z)
Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion over Diverse Terrains [6.967583364984562]
This work proposes a novel one-stage training framework-Learn to Teach (L2T)-which unifies teacher and student policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time. We validate the RL variant (L2T-RL) through extensive simulations and hardware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ challenging terrains without depth estimation modules.
arXiv Detail & Related papers (2024-02-09T21:16:43Z)
Facilitating Sim-to-real by Intrinsic Stochasticity of Real-Time Simulation in Reinforcement Learning for Robot Manipulation [1.6686307101054858]
We investigate the properties of intrinsicity of real-time simulation (RT-IS) of off-the-shelf simulation software. RT-IS requires less randomization, is not task-dependent, and achieves better generalizability than the conventional domain-randomization-powered agents.
arXiv Detail & Related papers (2023-04-12T12:15:31Z)
Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z)
Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning [121.9708998627352]
Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off. This work revisits the robustness-accuracy trade-off in robot learning by analyzing if recent advances in robust training methods and theory can make adversarial training suitable for real-world robot applications.
arXiv Detail & Related papers (2022-04-15T08:12:15Z)
Adversarial Training is Not Ready for Robot Learning [55.493354071227174]
Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations. We show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects. Our results suggest that adversarial training is not yet ready for robot learning.
arXiv Detail & Related papers (2021-03-15T07:51:31Z)
RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real [74.45688231140689]
We introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. We obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning.
arXiv Detail & Related papers (2020-06-16T08:58:07Z)
Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning. This adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.