Reinforcement Learning with Simple Sequence Priors
- URL: http://arxiv.org/abs/2305.17109v1
- Date: Fri, 26 May 2023 17:18:14 GMT
- Title: Reinforcement Learning with Simple Sequence Priors
- Authors: Tankred Saanum, No\'emi \'Eltet\H{o}, Peter Dayan, Marcel Binz, Eric
Schulz
- Abstract summary: We propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible.
We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches.
- Score: 9.869634509510016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Everything else being equal, simpler models should be preferred over more
complex ones. In reinforcement learning (RL), simplicity is typically
quantified on an action-by-action basis -- but this timescale ignores temporal
regularities, like repetitions, often present in sequential strategies. We
therefore propose an RL algorithm that learns to solve tasks with sequences of
actions that are compressible. We explore two possible sources of simple action
sequences: Sequences that can be learned by autoregressive models, and
sequences that are compressible with off-the-shelf data compression algorithms.
Distilling these preferences into sequence priors, we derive a novel
information-theoretic objective that incentivizes agents to learn policies that
maximize rewards while conforming to these priors. We show that the resulting
RL algorithm leads to faster learning, and attains higher returns than
state-of-the-art model-free approaches in a series of continuous control tasks
from the DeepMind Control Suite. These priors also produce a powerful
information-regularized agent that is robust to noisy observations and can
perform open-loop control.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Towards Correlated Sequential Rules [4.743965372344134]
High-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns.
The existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules.
We propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM.
arXiv Detail & Related papers (2022-10-27T17:27:23Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Deep Reinforcement Learning with Adjustments [10.244120641608447]
We propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms.
Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements.
arXiv Detail & Related papers (2021-09-28T03:35:09Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.