CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning
- URL: http://arxiv.org/abs/2205.00943v2
- Date: Tue, 3 May 2022 06:22:54 GMT
- Title: CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning
- Authors: Chenyu Sun, Hangwei Qian, Chunyan Miao
- Abstract summary: We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
- Score: 56.20123080771364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning (RL), it is challenging to learn directly from
high-dimensional observations, where data augmentation has recently been shown
to remedy this via encoding invariances from raw pixels. Nevertheless, we
empirically find that not all samples are equally important and hence simply
injecting more augmented inputs may instead cause instability in Q-learning. In
this paper, we approach this problem systematically by developing a
model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF), which
can fully exploit sample importance and improve learning efficiency in a
self-supervised manner. Facilitated by the proposed contrastive curiosity, CCLF
is capable of prioritizing the experience replay, selecting the most
informative augmented inputs, and more importantly regularizing the Q-function
as well as the encoder to concentrate more on under-learned data. Moreover, it
encourages the agent to explore with a curiosity-based reward. As a result, the
agent can focus on more informative samples and learn representation
invariances more efficiently, with significantly reduced augmented inputs. We
apply CCLF to several base RL algorithms and evaluate on the DeepMind Control
Suite, Atari, and MiniGrid benchmarks, where our approach demonstrates superior
sample efficiency and learning performances compared with other
state-of-the-art methods.
Related papers
- Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Regularization Through Simultaneous Learning: A Case Study on Plant
Classification [0.0]
This paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning.
We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function.
Remarkably, our approach demonstrates superior performance over models without regularization.
arXiv Detail & Related papers (2023-05-22T19:44:57Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi
Divergence [78.15455360335925]
We present a new robust contrastive learning scheme, coined R'enyiCL, which can effectively manage harder augmentations.
Our method is built upon the variational lower bound of R'enyi divergence.
We show that R'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously.
arXiv Detail & Related papers (2022-08-12T13:37:05Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Sample-efficient Reinforcement Learning Representation Learning with
Curiosity Contrastive Forward Dynamics Model [17.41484483119774]
This paper considers a learning framework for Curiosity Contrastive Forward Dynamics Model (CCFDM) in achieving a more sample-efficient reinforcement learning (RL)
CCFDM incorporates a forward dynamics model (FDM) and performs contrastive learning to train its deep convolutional neural network-based image encoder (IE)
During training, CCFDM provides intrinsic rewards, produced based on FDM prediction error, encourages the curiosity of the RL agent to improve exploration.
arXiv Detail & Related papers (2021-03-15T10:08:52Z) - Multi-Pretext Attention Network for Few-shot Learning with
Self-supervision [37.6064643502453]
We propose a novel augmentation-free method for self-supervised learning, which does not rely on any auxiliary sample.
Besides, we propose Multi-pretext Attention Network (MAN), which exploits a specific attention mechanism to combine the traditional augmentation-relied methods and our GC.
We evaluate our MAN extensively on miniImageNet and tieredImageNet datasets and the results demonstrate that the proposed method outperforms the state-of-the-art (SOTA) relevant methods.
arXiv Detail & Related papers (2021-03-10T10:48:37Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.