Related papers: APS: Active Pretraining with Successor Features

APS: Active Pretraining with Successor Features

URL: http://arxiv.org/abs/2108.13956v1
Date: Tue, 31 Aug 2021 16:30:35 GMT
Title: APS: Active Pretraining with Successor Features
Authors: Hao Liu, Pieter Abbeel
Abstract summary: We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
Score: 96.24533716878055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new unsupervised pretraining objective for reinforcement learning. During the unsupervised reward-free pretraining phase, the agent maximizes mutual information between tasks and states induced by the policy. Our key contribution is a novel lower bound of this intractable quantity. We show that by reinterpreting and combining variational successor features~\citep{Hansen2020Fast} with nonparametric entropy maximization~\citep{liu2021behavior}, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via nonparametric entropy maximization, and the explored data can be efficiently leveraged to learn behavior by variational successor features. APS addresses the limitations of existing mutual information maximization based and entropy maximization based unsupervised RL, and combines the best of both worlds. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning [13.652106087606471]
This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features. A policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward. We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency.
arXiv Detail & Related papers (2023-08-29T18:17:35Z)
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training) Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z)
Multi-Augmentation for Efficient Visual Representation Learning for Self-supervised Pre-training [1.3733988835863333]
We propose Multi-Augmentations for Self-Supervised Learning (MA-SSRL), which fully searched for various augmentation policies to build the entire pipeline. MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training.
arXiv Detail & Related papers (2022-05-24T04:18:39Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery [88.97076030698433]
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery. CIC explicitly incentivizes diverse behaviors by maximizing state entropy. We find that CIC substantially improves over prior unsupervised skill discovery methods.
arXiv Detail & Related papers (2022-02-01T00:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.