OffSim: Offline Simulator for Model-based Offline Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2510.15495v1
- Date: Fri, 17 Oct 2025 10:07:55 GMT
- Title: OffSim: Offline Simulator for Model-based Offline Inverse Reinforcement Learning
- Authors: Woo-Jin Ahn, Sang-Ryul Baek, Yong-Jun Lee, Hyun-Duck Choi, Myo-Taeg Lim,
- Abstract summary: OffSim is a novel model-based offline inverse reinforcement learning framework.<n>It emulates environmental dynamics and reward structure directly from expert-generated state-action trajectories.<n>OffSim can subsequently train a policy offline without further interaction with the real environment.
- Score: 8.478536100809693
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning algorithms typically utilize an interactive simulator (i.e., environment) with a predefined reward function for policy training. Developing such simulators and manually defining reward functions, however, is often time-consuming and labor-intensive. To address this, we propose an Offline Simulator (OffSim), a novel model-based offline inverse reinforcement learning (IRL) framework, to emulate environmental dynamics and reward structure directly from expert-generated state-action trajectories. OffSim jointly optimizes a high-entropy transition model and an IRL-based reward function to enhance exploration and improve the generalizability of the learned reward. Leveraging these learned components, OffSim can subsequently train a policy offline without further interaction with the real environment. Additionally, we introduce OffSim$^+$, an extension that incorporates a marginal reward for multi-dataset settings to enhance exploration. Extensive MuJoCo experiments demonstrate that OffSim achieves substantial performance gains over existing offline IRL methods, confirming its efficacy and robustness.
Related papers
- Simulating Environments with Reasoning Models for Agent Training [55.98861707136674]
Building bespoke environments for training is heavy, brittle, and limits progress.<n>We propose two frameworks: Simia-SFT and Simia-RL.<n>Simia-SFT and Simia-RL enable scalable agent training without environment engineering.
arXiv Detail & Related papers (2025-11-03T18:29:57Z) - Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z) - UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning [78.86567400365392]
We present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories.<n>To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation.<n>Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks.
arXiv Detail & Related papers (2025-09-15T03:24:08Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL [25.991354823569033]
We show that in many regimes, while direct sim2real transfer may fail, we can utilize the simulator to learn a set of emphexploratory policies.
In particular, in the setting of low-rank MDPs, we show that coupling these exploratory policies with simple, practical approaches.
This is the first evidence that simulation transfer yields a provable gain in reinforcement learning in settings where direct sim2real transfer fails.
arXiv Detail & Related papers (2024-10-26T19:12:27Z) - LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots [20.715834172041763]
We propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage.<n>LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space.<n>Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics.
arXiv Detail & Related papers (2024-09-26T16:02:25Z) - COSBO: Conservative Offline Simulation-Based Policy Optimization [7.696359453385686]
offline reinforcement learning allows training reinforcement learning models on data from live deployments.
In contrast, simulation environments attempting to replicate the live environment can be used instead of the live data.
We propose a method that combines an imperfect simulation environment with data from the target environment, to train an offline reinforcement learning policy.
arXiv Detail & Related papers (2024-09-22T12:20:55Z) - Imitating Language via Scalable Inverse Reinforcement Learning [34.161807103808016]
We focus on investigating the inverse reinforcement learning perspective to imitation.<n>We find clear advantages for IRL-based imitation, in particular for retaining diversity while maximizing task performance.
arXiv Detail & Related papers (2024-09-02T16:48:57Z) - Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks [93.38375271826202]
We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks.
We first build a simulator by integrating Gaussian splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks.
In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, programming of expert demonstration training data, and the task understanding capabilities of Liquid networks.
arXiv Detail & Related papers (2024-06-21T13:48:37Z) - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.