Related papers: Scalable Semantic Non-Markovian Simulation Proxy for Reinforcement Learning

Related papers

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning [27.226155951073064]
Shop-R1 is a novel reinforcement learning framework aimed at enhancing the reasoning ability of Large Language Models (LLMs)<n>For rationale generation, we leverage internal model signals (e.g., logit distributions) to guide the reasoning process in a self-supervised manner.<n>For action prediction, we propose a hierarchical reward structure with difficulty-aware scaling to prevent reward hacking.
arXiv Detail & Related papers (2025-07-23T18:10:43Z)
Unveiling the Black Box: A Multi-Layer Framework for Explaining Reinforcement Learning-Based Cyber Agents [4.239727656979701]
We propose a unified, multi-layer explainability framework for RL-based attacker agents.<n>At the MDP level, we model cyberattacks as a Partially Observable Markov Decision Processes (POMDPs)<n>At the policy level, we analyse the temporal evolution of Q-values and use Prioritised Experience Replay (PER) to surface critical learning transitions.
arXiv Detail & Related papers (2025-05-16T21:29:55Z)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.96848846966087]
Training large language models (LLMs) as interactive agents presents unique challenges.<n>While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.<n>We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting [11.54181863246064]
We introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
arXiv Detail & Related papers (2024-12-19T22:22:37Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models [1.1468680421853177]
Self-Supervised Learning (SSL) is at the core of training modern large machine learning models. We propose RS3L ("Re-simulation-based self-supervised representation learning"), a novel simulation-based SSL strategy. We show how RS3L pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation.
arXiv Detail & Related papers (2024-03-11T18:00:47Z)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Anchored Learning for On-the-Fly Adaptation -- Extended Technical Report [45.123633153460034]
This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap. We identify that naive fine-tuning approaches lead to catastrophic forgetting, where policies maintain high rewards on frequently encountered states but lose performance on rarer, yet critical scenarios. Evaluations demonstrate that our approach enables behavior retention in sim-to-sim gymnasium tasks and in sim-to-real scenarios with racing quadrotors, achieving a near-50% reduction in power consumption while maintaining controllable, stable flight.
arXiv Detail & Related papers (2023-01-17T16:16:53Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Backpropagation through Time and Space: Learning Numerical Methods with Multi-Agent Reinforcement Learning [6.598324641949299]
We treat the numerical schemes underlying partial differential equations as a Partially Observable Markov Game (OMG) in Reinforcement Learning (RL) Similar to numerical solvers, our agent acts at each discrete location a computational space for efficient generalizable learning. To learn higher-order spatial methods by acting on local states, the agent must discern how its actions at a giventemporal location affect the future evolution of the state.
arXiv Detail & Related papers (2022-03-16T20:50:24Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation. In particular, we leverage an implicit latent variable model to parameterize a joint actor policy. We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
Large-Scale Multi-Agent Deep FBSDEs [28.525065041507982]
We present a framework for finding Markovian Nash Equilibria in multi-agent games using fictitious play. We showcase superior performance of our framework over the state-of-the-art deep fictitious play algorithm. We also demonstrate the applicability of our framework in robotics on a belief space autonomous racing problem.
arXiv Detail & Related papers (2020-11-21T23:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.