Provably Efficient Algorithms for Multi-Objective Competitive RL
- URL: http://arxiv.org/abs/2102.03192v1
- Date: Fri, 5 Feb 2021 14:26:00 GMT
- Title: Provably Efficient Algorithms for Multi-Objective Competitive RL
- Authors: Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra
- Abstract summary: We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
- Score: 54.22598924633369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study multi-objective reinforcement learning (RL) where an agent's reward
is represented as a vector. In settings where an agent competes against
opponents, its performance is measured by the distance of its average return
vector to a target set. We develop statistically and computationally efficient
algorithms to approach the associated target set. Our results extend
Blackwell's approachability theorem (Blackwell, 1956) to tabular RL, where
strategic exploration becomes essential. The algorithms presented are adaptive;
their guarantees hold even without Blackwell's approachability condition. If
the opponents use fixed policies, we give an improved rate of approaching the
target set while also tackling the more ambitious goal of simultaneously
minimizing a scalar cost function. We discuss our analysis for this special
case by relating our results to previous works on constrained RL. To our
knowledge, this work provides the first provably efficient algorithms for
vector-valued Markov games and our theoretical guarantees are near-optimal.
Related papers
- RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning [7.407106653769627]
Preference-based Reinforcement Learning (PbRL) studies the problem where agents receive only preferences over pairs of trajectories in each episode.
Traditional risk-aware objectives and algorithms are not applicable in such one-episode-reward settings.
We introduce Risk-Aware- PbRL, an algorithm designed to optimize both nested and static objectives.
arXiv Detail & Related papers (2024-10-31T02:25:43Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Discriminative Adversarial Unlearning [40.30974185546541]
We introduce a novel machine unlearning framework founded upon the established principles of the min-max optimization paradigm.
We capitalize on the capabilities of strong Membership Inference Attacks (MIA) to facilitate the unlearning of specific samples from a trained model.
Our proposed algorithm closely approximates the ideal benchmark of retraining from scratch for both random sample forgetting and class-wise forgetting schemes.
arXiv Detail & Related papers (2024-02-10T03:04:57Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal.
We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths.
We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - Why So Pessimistic? Estimating Uncertainties for Offline RL through
Ensembles, and Why Their Independence Matters [35.17151863463472]
We take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL)
We propose MSG, a practical offline RL algorithm that trains an ensemble of $Q$-functions with independently computed targets based on completely separate networks.
Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2022-05-27T01:30:12Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.