Augmenting Unsupervised Reinforcement Learning with Self-Reference
- URL: http://arxiv.org/abs/2311.09692v1
- Date: Thu, 16 Nov 2023 09:07:34 GMT
- Title: Augmenting Unsupervised Reinforcement Learning with Self-Reference
- Authors: Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang
- Abstract summary: Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
- Score: 63.68018737038331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans possess the ability to draw on past experiences explicitly when
learning new tasks and applying them accordingly. We believe this capacity for
self-referencing is especially advantageous for reinforcement learning agents
in the unsupervised pretrain-then-finetune setting. During pretraining, an
agent's past experiences can be explicitly utilized to mitigate the
nonstationarity of intrinsic rewards. In the finetuning phase, referencing
historical trajectories prevents the unlearning of valuable exploratory
behaviors. Motivated by these benefits, we propose the Self-Reference (SR)
approach, an add-on module explicitly designed to leverage historical
information and enhance agent performance within the pretrain-finetune
paradigm. Our approach achieves state-of-the-art results in terms of
Interquartile Mean (IQM) performance and Optimality Gap reduction on the
Unsupervised Reinforcement Learning Benchmark for model-free methods, recording
an 86% IQM and a 16% Optimality Gap. Additionally, it improves current
algorithms by up to 17% IQM and reduces the Optimality Gap by 31%. Beyond
performance enhancement, the Self-Reference add-on also increases sample
efficiency, a crucial attribute for real-world applications.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models [37.172662930947446]
Language models (LMs) are potentially vulnerable to extraction attacks, which represent a significant privacy risk.
We propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM.
POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin.
arXiv Detail & Related papers (2024-06-20T08:12:49Z) - Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Representation Learning via Invariant Causal Mechanisms [19.0976564154636]
Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data.
We show how data augmentations can be more effectively utilized through explicit invariance constraints on the proxy classifiers employed during pretraining.
We propose a novel self-supervised objective, Representation Learning via In Causvariantal Mechanisms (ReLIC) that enforces invariant prediction of proxy targets across augmentations.
arXiv Detail & Related papers (2020-10-15T17:53:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.