Provable Reset-free Reinforcement Learning by No-Regret Reduction
- URL: http://arxiv.org/abs/2301.02389v3
- Date: Sat, 22 Jul 2023 20:49:30 GMT
- Title: Provable Reset-free Reinforcement Learning by No-Regret Reduction
- Authors: Hoai-An Nguyen, Ching-An Cheng
- Abstract summary: We propose a generic no-regret reduction to systematically design reset-free RL algorithms.
Our reduction turns the reset-free RL problem into a two-player game.
We show that achieving sublinear regret in this two-player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem.
- Score: 13.800970428473134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) so far has limited real-world applications. One
key challenge is that typical RL algorithms heavily rely on a reset mechanism
to sample proper initial states; these reset mechanisms, in practice, are
expensive to implement due to the need for human intervention or heavily
engineered environments. To make learning more practical, we propose a generic
no-regret reduction to systematically design reset-free RL algorithms. Our
reduction turns the reset-free RL problem into a two-player game. We show that
achieving sublinear regret in this two-player game would imply learning a
policy that has both sublinear performance regret and sublinear total number of
resets in the original RL problem. This means that the agent eventually learns
to perform optimally and avoid resets. To demonstrate the effectiveness of this
reduction, we design an instantiation for linear Markov decision processes,
which is the first provably correct reset-free RL algorithm.
Related papers
- More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions.
CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z) - When Learning Is Out of Reach, Reset: Generalization in Autonomous
Visuomotor Reinforcement Learning [10.469509984098705]
Episodic training, where an agent's environment is reset after every success or failure, is the de facto standard when training embodied reinforcement learning (RL) agents.
In this work, we look to minimize, rather than completely eliminate, resets while building visual agents that can meaningfully generalize.
Our proposed approach significantly outperforms prior episodic, reset-free, and reset-minimizing approaches achieving higher success rates.
arXiv Detail & Related papers (2023-03-30T17:59:26Z) - Dual RL: Unification and New Methods for Reinforcement and Imitation
Learning [26.59374102005998]
We first cast several state-of-the-art offline RL and offline imitation learning (IL) algorithms as instances of dual RL approaches with shared structures.
We propose a new discriminator-free method ReCOIL that learns to imitate from arbitrary off-policy data to obtain near-expert performance.
For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss.
arXiv Detail & Related papers (2023-02-16T20:10:06Z) - Delayed Geometric Discounts: An Alternative Criterion for Reinforcement
Learning [1.52292571922932]
reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors.
In practice, RL algorithms rely on geometric discounts to evaluate this optimality.
In this paper, we tackle these issues by generalizing the discounted problem formulation with a family of delayed objective functions.
arXiv Detail & Related papers (2022-09-26T07:49:38Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - A Simple Reward-free Approach to Constrained Reinforcement Learning [33.813302183231556]
This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
arXiv Detail & Related papers (2021-07-12T06:27:30Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.