SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning
- URL: http://arxiv.org/abs/2203.10050v1
- Date: Fri, 18 Mar 2022 16:50:38 GMT
- Title: SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning
- Authors: Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel,
Kimin Lee
- Abstract summary: We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
- Score: 168.89470249446023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Preference-based reinforcement learning (RL) has shown potential for teaching
agents to perform the target tasks without a costly, pre-defined reward
function by learning the reward with a supervisor's preference between the two
agent behaviors. However, preference-based learning often requires a large
amount of human feedback, making it difficult to apply this approach to various
applications. This data-efficiency problem, on the other hand, has been
typically addressed by using unlabeled samples or data augmentation techniques
in the context of supervised learning. Motivated by the recent success of these
approaches, we present SURF, a semi-supervised reward learning framework that
utilizes a large amount of unlabeled samples with data augmentation. In order
to leverage unlabeled samples for reward learning, we infer pseudo-labels of
the unlabeled samples based on the confidence of the preference predictor. To
further improve the label-efficiency of reward learning, we introduce a new
data augmentation that temporally crops consecutive subsequences from the
original behaviors. Our experiments demonstrate that our approach significantly
improves the feedback-efficiency of the state-of-the-art preference-based
method on a variety of locomotion and robotic manipulation tasks.
Related papers
- Intent-Enhanced Data Augmentation for Sequential Recommendation [20.639934432829325]
We propose an intent-enhanced data augmentation method for sequential recommendation(textbfIESRec)
IESRec constructs positive and negative samples based on user behavior sequences through intent-segment insertion.
The generated positive and negative samples are used to build a contrastive loss function, enhancing recommendation performance through self-supervised training.
arXiv Detail & Related papers (2024-10-11T07:23:45Z) - Active Learning to Guide Labeling Efforts for Question Difficulty Estimation [1.0514231683620516]
Transformer-based neural networks achieve state-of-the-art performance, primarily through supervised methods but with an isolated study in unsupervised learning.
This work bridges the research gap by exploring active learning for QDE, a supervised human-in-the-loop approach.
Experiments demonstrate that active learning with PowerVariance acquisition achieves a performance close to fully supervised models after labeling only 10% of the training data.
arXiv Detail & Related papers (2024-09-14T02:02:42Z) - Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Squeezing Backbone Feature Distributions to the Max for Efficient
Few-Shot Learning [3.1153758106426603]
Few-shot classification is a challenging problem due to the uncertainty caused by using few labelled samples.
We propose a novel transfer-based method which aims at processing the feature vectors so that they become closer to Gaussian-like distributions.
In the case of transductive few-shot learning where unlabelled test samples are available during training, we also introduce an optimal-transport inspired algorithm to boost even further the achieved performance.
arXiv Detail & Related papers (2021-10-18T16:29:17Z) - Ask-n-Learn: Active Learning via Reliable Gradient Representations for
Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data.
We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.