Responsible Active Learning via Human-in-the-loop Peer Study
- URL: http://arxiv.org/abs/2211.13587v1
- Date: Thu, 24 Nov 2022 13:18:27 GMT
- Title: Responsible Active Learning via Human-in-the-loop Peer Study
- Authors: Yu-Tong Cao, Jingya Wang, Baosheng Yu, Dacheng Tao
- Abstract summary: We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
- Score: 88.01358655203441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning has been proposed to reduce data annotation efforts by only
manually labelling representative data samples for training. Meanwhile, recent
active learning applications have benefited a lot from cloud computing services
with not only sufficient computational resources but also crowdsourcing
frameworks that include many humans in the active learning loop. However,
previous active learning methods that always require passing large-scale
unlabelled data to cloud may potentially raise significant data privacy issues.
To mitigate such a risk, we propose a responsible active learning method,
namely Peer Study Learning (PSL), to simultaneously preserve data privacy and
improve model stability. Specifically, we first introduce a human-in-the-loop
teacher-student architecture to isolate unlabelled data from the task learner
(teacher) on the cloud-side by maintaining an active learner (student) on the
client-side. During training, the task learner instructs the light-weight
active learner which then provides feedback on the active sampling criterion.
To further enhance the active learner via large-scale unlabelled data, we
introduce multiple peer students into the active learner which is trained by a
novel learning paradigm, including the In-Class Peer Study on labelled data and
the Out-of-Class Peer Study on unlabelled data. Lastly, we devise a
discrepancy-based active sampling criterion, Peer Study Feedback, that exploits
the variability of peer students to select the most informative data to improve
model stability. Extensive experiments demonstrate the superiority of the
proposed PSL over a wide range of active learning methods in both standard and
sensitive protection settings.
Related papers
- Active Learning to Guide Labeling Efforts for Question Difficulty Estimation [1.0514231683620516]
Transformer-based neural networks achieve state-of-the-art performance, primarily through supervised methods but with an isolated study in unsupervised learning.
This work bridges the research gap by exploring active learning for QDE, a supervised human-in-the-loop approach.
Experiments demonstrate that active learning with PowerVariance acquisition achieves a performance close to fully supervised models after labeling only 10% of the training data.
arXiv Detail & Related papers (2024-09-14T02:02:42Z) - Advancing Deep Active Learning & Data Subset Selection: Unifying
Principles with Information-Theory Intuitions [3.0539022029583953]
This thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models.
We investigate data subset selection techniques, specifically active learning and active sampling, grounded in information-theoretic principles.
arXiv Detail & Related papers (2024-01-09T01:41:36Z) - Model Uncertainty based Active Learning on Tabular Data using Boosted
Trees [0.4667030429896303]
Supervised machine learning relies on the availability of good labelled data for model training.
Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently.
arXiv Detail & Related papers (2023-10-30T14:29:53Z) - Active Learning with Contrastive Pre-training for Facial Expression
Recognition [19.442685015494316]
We study 8 recent active learning methods on three public FER datasets.
Our findings show that existing active learning methods do not perform well in the context of FER.
We propose contrastive self-supervised pre-training, which first learns the underlying representations based on the entire unlabelled dataset.
arXiv Detail & Related papers (2023-07-06T03:08:03Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Mind Your Outliers! Investigating the Negative Impact of Outliers on
Active Learning for Visual Question Answering [71.15403434929915]
We show that across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection.
We identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn.
We show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases.
arXiv Detail & Related papers (2021-07-06T00:52:11Z) - Active Learning: Problem Settings and Recent Developments [2.1574781022415364]
This paper explains the basic problem settings of active learning and recent research trends.
In particular, research on learning acquisition functions to select samples from the data for labeling, theoretical work on active learning algorithms, and stopping criteria for sequential data acquisition are highlighted.
arXiv Detail & Related papers (2020-12-08T05:24:06Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.