Related papers: Provably Efficient Algorithms for Multi-Objective Competitive RL

Provably Efficient Algorithms for Multi-Objective Competitive RL

URL: http://arxiv.org/abs/2102.03192v1
Date: Fri, 5 Feb 2021 14:26:00 GMT
Title: Provably Efficient Algorithms for Multi-Objective Competitive RL
Authors: Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra
Abstract summary: We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector. In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set. We develop statistically and computationally efficient algorithms to approach the associated target set.
Score: 54.22598924633369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector. In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set. We develop statistically and computationally efficient algorithms to approach the associated target set. Our results extend Blackwell's approachability theorem (Blackwell, 1956) to tabular RL, where strategic exploration becomes essential. The algorithms presented are adaptive; their guarantees hold even without Blackwell's approachability condition. If the opponents use fixed policies, we give an improved rate of approaching the target set while also tackling the more ambitious goal of simultaneously minimizing a scalar cost function. We discuss our analysis for this special case by relating our results to previous works on constrained RL. To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics [0.0]
We present our solution to the AAAI-25 VRD-IU challenge. By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy. Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing.
arXiv Detail & Related papers (2025-02-11T10:37:01Z)
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning [7.407106653769627]
Preference-based Reinforcement Learning (PbRL) studies the problem where agents receive only preferences over pairs of trajectories in each episode. Traditional risk-aware objectives and algorithms are not applicable in such one-episode-reward settings. We introduce Risk-Aware- PbRL, an algorithm designed to optimize both nested and static objectives.
arXiv Detail & Related papers (2024-10-31T02:25:43Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Discriminative Adversarial Unlearning [40.30974185546541]
We introduce a novel machine unlearning framework founded upon the established principles of the min-max optimization paradigm. We capitalize on the capabilities of strong Membership Inference Attacks (MIA) to facilitate the unlearning of specific samples from a trained model. Our proposed algorithm closely approximates the ideal benchmark of retraining from scratch for both random sample forgetting and class-wise forgetting schemes.
arXiv Detail & Related papers (2024-02-10T03:04:57Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback. We consider the general reward setting where the reward can be defined over the whole trajectory. We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z)
Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal. We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths. We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z)
Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL) It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z)
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters [35.17151863463472]
We take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL) We propose MSG, a practical offline RL algorithm that trains an ensemble of $Q$-functions with independently computed targets based on completely separate networks. Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2022-05-27T01:30:12Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.