A Simple Reward-free Approach to Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2107.05216v1
- Date: Mon, 12 Jul 2021 06:27:30 GMT
- Title: A Simple Reward-free Approach to Constrained Reinforcement Learning
- Authors: Sobhan Miryoosefi, Chi Jin
- Abstract summary: This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
- Score: 33.813302183231556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In constrained reinforcement learning (RL), a learning agent seeks to not
only optimize the overall reward but also satisfy the additional safety,
diversity, or budget constraints. Consequently, existing constrained RL
solutions require several new algorithmic ingredients that are notably
different from standard RL. On the other hand, reward-free RL is independently
developed in the unconstrained literature, which learns the transition dynamics
without using the reward information, and thus naturally capable of addressing
RL with multiple objectives under the common dynamics. This paper bridges
reward-free RL and constrained RL. Particularly, we propose a simple
meta-algorithm such that given any reward-free RL oracle, the approachability
and constrained RL problems can be directly solved with negligible overheads in
sample complexity. Utilizing the existing reward-free RL solvers, our framework
provides sharp sample complexity results for constrained RL in the tabular MDP
setting, matching the best existing results up to a factor of horizon
dependence; our framework directly extends to a setting of tabular two-player
Markov games, and gives a new result for constrained RL with linear function
Related papers
- Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Is RLHF More Difficult than Standard RL? [31.972393805014903]
Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals.
This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs.
arXiv Detail & Related papers (2023-06-25T03:18:15Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Hierarchical Reinforcement Learning for Relay Selection and Power
Optimization in Two-Hop Cooperative Relay Network [7.5377621697101205]
We study the outage probability minimizing problem subjected to a total transmission power constraint in a two-hop cooperative relay network.
We use reinforcement learning (RL) methods to learn strategies for relay selection and power allocation.
We propose a hierarchical reinforcement learning (HRL) framework and training algorithm.
arXiv Detail & Related papers (2020-11-10T04:47:41Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.