APS: Active Pretraining with Successor Features
- URL: http://arxiv.org/abs/2108.13956v1
- Date: Tue, 31 Aug 2021 16:30:35 GMT
- Title: APS: Active Pretraining with Successor Features
- Authors: Hao Liu, Pieter Abbeel
- Abstract summary: We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
- Score: 96.24533716878055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new unsupervised pretraining objective for reinforcement
learning. During the unsupervised reward-free pretraining phase, the agent
maximizes mutual information between tasks and states induced by the policy.
Our key contribution is a novel lower bound of this intractable quantity. We
show that by reinterpreting and combining variational successor
features~\citep{Hansen2020Fast} with nonparametric entropy
maximization~\citep{liu2021behavior}, the intractable mutual information can be
efficiently optimized. The proposed method Active Pretraining with Successor
Feature (APS) explores the environment via nonparametric entropy maximization,
and the explored data can be efficiently leveraged to learn behavior by
variational successor features. APS addresses the limitations of existing
mutual information maximization based and entropy maximization based
unsupervised RL, and combines the best of both worlds. When evaluated on the
Atari 100k data-efficiency benchmark, our approach significantly outperforms
previous methods combining unsupervised pretraining with task-specific
finetuning.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Adversarial Style Transfer for Robust Policy Optimization in Deep
Reinforcement Learning [13.652106087606471]
This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features.
A policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward.
We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency.
arXiv Detail & Related papers (2023-08-29T18:17:35Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - Multi-Augmentation for Efficient Visual Representation Learning for
Self-supervised Pre-training [1.3733988835863333]
We propose Multi-Augmentations for Self-Supervised Learning (MA-SSRL), which fully searched for various augmentation policies to build the entire pipeline.
MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training.
arXiv Detail & Related papers (2022-05-24T04:18:39Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery [88.97076030698433]
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery.
CIC explicitly incentivizes diverse behaviors by maximizing state entropy.
We find that CIC substantially improves over prior unsupervised skill discovery methods.
arXiv Detail & Related papers (2022-02-01T00:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.