Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression
- URL: http://arxiv.org/abs/2410.19400v4
- Date: Fri, 01 Nov 2024 07:20:10 GMT
- Title: Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression
- Authors: Yixiu Mao, Qi Wang, Chen Chen, Yun Qu, Xiangyang Ji,
- Abstract summary: In offline reinforcement learning (RL), addressing the out-of-distribution (OOD) action issue has been a focus.
We argue that there exists an OOD state issue that also impairs performance yet has been underexplored.
We propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL.
- Score: 47.598803055066554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (RL), addressing the out-of-distribution (OOD) action issue has been a focus, but we argue that there exists an OOD state issue that also impairs performance yet has been underexplored. Such an issue describes the scenario when the agent encounters states out of the offline dataset during the test phase, leading to uncontrolled behavior and performance degradation. To this end, we propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL. Technically, SCAS achieves value-aware OOD state correction, capable of correcting the agent from OOD states to high-value in-distribution states. Theoretical and empirical results show that SCAS also exhibits the effect of suppressing OOD actions. On standard offline RL benchmarks, SCAS achieves excellent performance without additional hyperparameter tuning. Moreover, benefiting from its OOD state correction feature, SCAS demonstrates enhanced robustness against environmental perturbations.
Related papers
- Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach [11.836153064242811]
offline reinforcement learning (RL) aims to learn decision-making policies from fixed datasets without online interactions.<n>We propose Advantage-based Diffusion Actor-Critic (ADAC) as a novel method that systematically evaluates OOD actions.<n>ADAC achieves state-of-the-art performance on almost all tasks in the D4RL benchmark.
arXiv Detail & Related papers (2025-05-08T10:57:28Z) - HALO: Robust Out-of-Distribution Detection via Joint Optimisation [11.107924895663173]
Effective out-of-distribution (OOD) detection is crucial for the safe deployment of machine learning models in real-world scenarios.
Recent work has shown that OOD detection methods are vulnerable to adversarial attacks, potentially leading to critical failures in high-stakes applications.
We introduce an additional loss term which boosts classification and detection performance.
Our approach, called HALO, surpasses existing methods and achieves state-of-the-art performance across a number of datasets and attack settings.
arXiv Detail & Related papers (2025-02-27T04:40:18Z) - The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness.
We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z) - Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution [38.844580833635725]
We present a training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs.
Our method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks.
arXiv Detail & Related papers (2024-07-23T12:28:59Z) - A Survey on Evaluation of Out-of-Distribution Generalization [41.39827887375374]
Out-of-Distribution (OOD) generalization is a complex and fundamental problem.
This paper serves as the first effort to conduct a comprehensive review of OOD evaluation.
We categorize existing research into three paradigms: OOD performance testing, OOD performance prediction, and OOD intrinsic property characterization.
arXiv Detail & Related papers (2024-03-04T09:30:35Z) - AUTO: Adaptive Outlier Optimization for Online Test-Time OOD Detection [81.49353397201887]
Out-of-distribution (OOD) detection is crucial to deploying machine learning models in open-world applications.
We introduce a novel paradigm called test-time OOD detection, which utilizes unlabeled online data directly at test time to improve OOD detection performance.
We propose adaptive outlier optimization (AUTO), which consists of an in-out-aware filter, an ID memory bank, and a semantically-consistent objective.
arXiv Detail & Related papers (2023-03-22T02:28:54Z) - Out-of-distribution Detection with Implicit Outlier Transformation [72.73711947366377]
Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection.
We propose a novel OE-based approach that makes the model perform well for unseen OOD situations.
arXiv Detail & Related papers (2023-03-09T04:36:38Z) - Average of Pruning: Improving Performance and Stability of
Out-of-Distribution Detection [37.43981354073841]
We find the performance of OOD detection suffers from overfitting and instability during training.
We propose Average of Pruning (AoP), consisting of model averaging and pruning, to mitigate the unstable behaviors.
arXiv Detail & Related papers (2023-03-02T12:34:38Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - ATOM: Robustifying Out-of-distribution Detection Using Outlier Mining [51.19164318924997]
Adrial Training with informative Outlier Mining improves robustness of OOD detection.
ATOM achieves state-of-the-art performance under a broad family of classic and adversarial OOD evaluation tasks.
arXiv Detail & Related papers (2020-06-26T20:58:05Z) - Robust Out-of-distribution Detection for Neural Networks [51.19164318924997]
We show that existing detection mechanisms can be extremely brittle when evaluating on in-distribution and OOD inputs.
We propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples.
arXiv Detail & Related papers (2020-03-21T17:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.