Adversarially Trained Actor Critic for offline CMDPs
- URL: http://arxiv.org/abs/2401.00629v1
- Date: Mon, 1 Jan 2024 01:44:58 GMT
- Title: Adversarially Trained Actor Critic for offline CMDPs
- Authors: Honghao Wei, Xiyue Peng, Xin Liu, Arnob Ghosh
- Abstract summary: We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL)
Our framework provides both theoretical guarantees and a robust deep-RL implementation.
We demonstrate that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety.
- Score: 10.861449694255137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for
offline reinforcement learning (RL) with general function approximation in the
presence of limited data coverage. SATAC operates as a two-player Stackelberg
game featuring a refined objective function. The actor (leader player)
optimizes the policy against two adversarially trained value critics (follower
players), who focus on scenarios where the actor's performance is inferior to
the behavior policy. Our framework provides both theoretical guarantees and a
robust deep-RL implementation. Theoretically, we demonstrate that when the
actor employs a no-regret optimization oracle, SATAC achieves two guarantees:
(i) For the first time in the offline RL setting, we establish that SATAC can
produce a policy that outperforms the behavior policy while maintaining the
same level of safety, which is critical to designing an algorithm for offline
RL. (ii) We demonstrate that the algorithm guarantees policy improvement across
a broad range of hyperparameters, indicating its practical robustness.
Additionally, we offer a practical version of SATAC and compare it with
existing state-of-the-art offline safe-RL algorithms in continuous control
environments. SATAC outperforms all baselines across a range of tasks, thus
validating the theoretical performance.
Related papers
- Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy [4.854443247023496]
offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset.
We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
arXiv Detail & Related papers (2024-03-04T05:20:57Z) - Safe Reinforcement Learning with Dual Robustness [10.455148541147796]
Reinforcement learning (RL) agents are vulnerable to adversarial disturbances.
We propose a systematic framework to unify safe RL and robust RL.
We also design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC)
arXiv Detail & Related papers (2023-09-13T09:34:21Z) - Efficient Action Robust Reinforcement Learning with Probabilistic Policy
Execution Uncertainty [43.55450683502937]
In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty.
We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty.
We also develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that minimax optimal regret and sample complexity.
arXiv Detail & Related papers (2023-07-15T00:26:51Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - A Strong Baseline for Batch Imitation Learning [25.392006064406967]
We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm.
This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern.
arXiv Detail & Related papers (2023-02-06T14:03:33Z) - Adversarially Trained Actor Critic for Offline Reinforcement Learning [42.42451519801851]
ATAC is a new model-free algorithm for offline reinforcement learning under insufficient data coverage.
In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
arXiv Detail & Related papers (2022-02-05T01:02:46Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.