User-Oriented Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2202.07301v2
- Date: Thu, 17 Feb 2022 12:10:24 GMT
- Title: User-Oriented Robust Reinforcement Learning
- Authors: Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun, Xinbing
Wang
- Abstract summary: We propose a novel User-Oriented Robust RL (UOR-RL) framework for policy learning.
We define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference.
We prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution.
- Score: 25.02456730639135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, improving the robustness of policies across different environments
attracts increasing attention in the reinforcement learning (RL) community.
Existing robust RL methods mostly aim to achieve the max-min robustness by
optimizing the policy's performance in the worst-case environment. However, in
practice, a user that uses an RL policy may have different preferences over its
performance across environments. Clearly, the aforementioned max-min robustness
is oftentimes too conservative to satisfy user preference. Therefore, in this
paper, we integrate user preference into policy learning in robust RL, and
propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we
define a new User-Oriented Robustness (UOR) metric for RL, which allocates
different weights to the environments according to user preference and
generalizes the max-min robustness metric. To optimize the UOR metric, we
develop two different UOR-RL training algorithms for the scenarios with or
without a priori known environment distribution, respectively. Theoretically,
we prove that our UOR-RL training algorithms converge to near-optimal policies
even with inaccurate or completely no knowledge about the environment
distribution. Furthermore, we carry out extensive experimental evaluations in 4
MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to
the state-of-the-art baselines under the average and worst-case performance
metrics, and more importantly establishes new state-of-the-art performance
under the UOR metric.
Related papers
- Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer [22.46061028295081]
reinforcement learning has emerged as a promising technique for improving placement quality.
Current RL-based placement methods suffer from long training times, low generalization ability, and inability to guarantee PPA results.
We propose an approach that utilizes RL for the refinement stage, which allows the RL policy to learn how to adjust existing placement layouts.
We evaluate our approach on the ISPD 2005 and ICCAD 2015 benchmark, comparing the global half-perimeter wirelength and regularity of our proposed method against several competitive approaches.
arXiv Detail & Related papers (2024-12-10T04:01:21Z) - DPO: Differential reinforcement learning with application to optimal configuration search [3.2857981869020327]
Reinforcement learning with continuous state and action spaces remains one of the most challenging problems within the field.
We propose the first differential RL framework that can handle settings with limited training samples and short-length episodes.
arXiv Detail & Related papers (2024-04-24T03:11:12Z) - Hyperparameters in Reinforcement Learning and How To Tune Them [25.782420501870295]
We show that hyper parameter choices in deep reinforcement learning can significantly affect the agent's final performance and sample efficiency.
We propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds.
We support this by comparing state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts.
arXiv Detail & Related papers (2023-06-02T07:48:18Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - B-Pref: Benchmarking Preference-Based Reinforcement Learning [84.41494283081326]
We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
arXiv Detail & Related papers (2021-11-04T17:32:06Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Pareto Deterministic Policy Gradients and Its Application in 5G Massive
MIMO Networks [32.099949375036495]
We consider jointly optimizing cell load balance and network throughput via a reinforcement learning (RL) approach.
Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics.
To accomplish this joint optimization, we integrate vector rewards into the RL value network and conduct RL action via a separate policy network.
arXiv Detail & Related papers (2020-12-02T15:35:35Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.