RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
- URL: http://arxiv.org/abs/2510.02695v1
- Date: Fri, 03 Oct 2025 03:22:21 GMT
- Title: RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
- Authors: Kai Fukazawa, Kunal Mundada, Iman Soltani,
- Abstract summary: In safety-critical domains, offline reinforcement learning offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk.<n>Here, we introduce the bfRisk-Aware Multimodal Actor-Critic (RAMAC) framework, which couples an emphexpressive generative actor with a distributional critic.<n>We instantiate RAMAC with diffusion and flow-matching actors and observe consistent gains in $mathrmaR_0.1$ while maintaining strong returns on most-D4 tasks.
- Score: 1.593065406609169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value conservatism and restricted policy classes, whereas expressive policies are only used in risk-neutral settings. Here, we address this gap by introducing the \textbf{Risk-Aware Multimodal Actor-Critic (RAMAC)} framework, which couples an \emph{expressive generative actor} with a distributional critic. The RAMAC differentiates composite objective combining distributional risk and BC loss through the generative path, achieving risk-sensitive learning in complex multimodal scenarios. We instantiate RAMAC with diffusion and flow-matching actors and observe consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns on most Stochastic-D4RL tasks. Code: https://github.com/KaiFukazawa/RAMAC.git
Related papers
- Conditional Sequence Modeling for Safe Reinforcement Learning [8.858563919623082]
offline safe reinforcement learning aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints.<n>Most existing offline safe RL methods are trained under a pre-specified threshold.<n>We propose RCDT, a CSM-based method that supports zero-shot deployment across multiple cost thresholds within a single trained policy.
arXiv Detail & Related papers (2026-02-09T12:22:57Z) - Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression [2.592761128203891]
Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go.<n>Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions.<n>We propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk to enforce safety without complex architectures.
arXiv Detail & Related papers (2025-06-08T00:22:00Z) - Efficient Off-Policy Safe Reinforcement Learning Using Trust Region
Conditional Value at Risk [16.176812250762666]
An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method.
To achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient.
We propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers.
arXiv Detail & Related papers (2023-12-01T04:29:19Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement
Learning [8.089234432461804]
offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without the possibility of additional online data collection.
This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment.
We present a simple-yet-highly-effective risk-aware planning algorithm for offline RL.
arXiv Detail & Related papers (2022-11-06T07:42:24Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Conservative Offline Distributional Reinforcement Learning [34.95001490294207]
We propose Conservative Offline Distributional Actor Critic (CODAC) for both risk-neutral and risk-averse domains.
CODAC adapts distributional RL to the offline setting by penalizing the predicted quantiles of the return for out-of-distribution actions.
In experiments, CODAC successfully learns risk-averse policies using offline data collected purely from risk-neutral agents.
arXiv Detail & Related papers (2021-07-12T15:38:06Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - Risk-Averse Offline Reinforcement Learning [46.383648750385575]
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration.
We present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting.
arXiv Detail & Related papers (2021-02-10T10:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.