Closing the Sim2Real Performance Gap in RL
- URL: http://arxiv.org/abs/2510.17709v1
- Date: Mon, 20 Oct 2025 16:25:13 GMT
- Title: Closing the Sim2Real Performance Gap in RL
- Authors: Akhil S Anand, Shambhuraj Sawant, Jasper Hoffmann, Dirk Reinhardt, Sebastien Gros,
- Abstract summary: Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world.<n>Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments.<n>We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance.
- Score: 3.0709946817431875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.
Related papers
- PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies [88.78188489161028]
We introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS)<n>PolaRiS is a scalable real-to-sim framework for high-fidelity simulated robot evaluation.<n>We show that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks.
arXiv Detail & Related papers (2025-12-18T18:49:41Z) - Simulating Environments with Reasoning Models for Agent Training [55.98861707136674]
Building bespoke environments for training is heavy, brittle, and limits progress.<n>We propose two frameworks: Simia-SFT and Simia-RL.<n>Simia-SFT and Simia-RL enable scalable agent training without environment engineering.
arXiv Detail & Related papers (2025-11-03T18:29:57Z) - SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z) - PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization [53.7088694598817]
We introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators.<n>Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training.
arXiv Detail & Related papers (2025-10-02T06:31:42Z) - Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL [25.991354823569033]
We show that in many regimes, while direct sim2real transfer may fail, we can utilize the simulator to learn a set of emphexploratory policies.
In particular, in the setting of low-rank MDPs, we show that coupling these exploratory policies with simple, practical approaches.
This is the first evidence that simulation transfer yields a provable gain in reinforcement learning in settings where direct sim2real transfer fails.
arXiv Detail & Related papers (2024-10-26T19:12:27Z) - LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots [20.715834172041763]
We propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage.<n>LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space.<n>Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics.
arXiv Detail & Related papers (2024-09-26T16:02:25Z) - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - Marginalized Importance Sampling for Off-Environment Policy Evaluation [13.824507564510503]
Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots.
This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world.
Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy.
arXiv Detail & Related papers (2023-09-04T20:52:04Z) - A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer towards Autonomous Driving [0.0]
Deep Reinforcement Learning (DRL) has shown remarkable success in solving complex tasks.
transferring DRL agents to the real world is still challenging due to the significant discrepancies between simulation and reality.
We propose a robust DRL framework that leverages platform-dependent perception modules to extract task-relevant information.
arXiv Detail & Related papers (2023-04-14T07:55:07Z) - AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer [10.173835871228718]
AdaptSim aims to optimize task performance in target (real) environments.
First, we meta-learn an adaptation policy in simulation using reinforcement learning.
We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training.
arXiv Detail & Related papers (2023-02-09T19:10:57Z) - Auto-Tuned Sim-to-Real Transfer [143.44593793640814]
Policies trained in simulation often fail when transferred to the real world.
Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering.
We propose a method for automatically tuning simulator system parameters to match the real world.
arXiv Detail & Related papers (2021-04-15T17:59:55Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.