On the Marginal Benefit of Active Learning: Does Self-Supervision Eat
Its Cake?
- URL: http://arxiv.org/abs/2011.08121v1
- Date: Mon, 16 Nov 2020 17:34:55 GMT
- Title: On the Marginal Benefit of Active Learning: Does Self-Supervision Eat
Its Cake?
- Authors: Yao-Chun Chan, Mingchen Li, Samet Oymak
- Abstract summary: We present a novel framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training.
Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime.
We fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.
- Score: 31.563514432259897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning is the set of techniques for intelligently labeling large
unlabeled datasets to reduce the labeling effort. In parallel, recent
developments in self-supervised and semi-supervised learning (S4L) provide
powerful techniques, based on data-augmentation, contrastive learning, and
self-training, that enable superior utilization of unlabeled data which led to
a significant reduction in required labeling in the standard machine learning
benchmarks. A natural question is whether these paradigms can be unified to
obtain superior results. To this aim, this paper provides a novel algorithmic
framework integrating self-supervised pretraining, active learning, and
consistency-regularized self-training. We conduct extensive experiments with
our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to
isolate and assess the benefits of individual components which are evaluated
using state-of-the-art methods (e.g.~Core-Set, VAAL, simCLR, FixMatch). Our
experiments reveal two key insights: (i) Self-supervised pre-training
significantly improves semi-supervised learning, especially in the few-label
regime, (ii) The benefit of active learning is undermined and subsumed by S4L
techniques. Specifically, we fail to observe any additional benefit of
state-of-the-art active learning algorithms when combined with state-of-the-art
S4L techniques.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Active Learning Guided by Efficient Surrogate Learners [25.52920030051264]
Re-training a deep learning model each time a single data point receives a new label is impractical.
We introduce a new active learning algorithm that harnesses the power of a Gaussian process surrogate in conjunction with the neural network principal learner.
Our proposed model adeptly updates the surrogate learner for every new data instance, enabling it to emulate and capitalize on the continuous learning dynamics of the neural network.
arXiv Detail & Related papers (2023-01-07T01:35:25Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - Hyperspherical Consistency Regularization [45.00073340936437]
We explore the relationship between self-supervised learning and supervised learning, and study how self-supervised learning helps robust data-efficient deep learning.
We propose hyperspherical consistency regularization (HCR), a simple yet effective plug-and-play method, to regularize the classifier using feature-dependent information and thus avoid bias from labels.
arXiv Detail & Related papers (2022-06-02T02:41:13Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Reducing Label Effort: Self-Supervised meets Active Learning [32.4747118398236]
Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets.
Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort.
The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
arXiv Detail & Related papers (2021-08-25T20:04:44Z) - Rebuilding Trust in Active Learning with Actionable Metrics [77.99796068970569]
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs.
This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets.
We present various actionable metrics to help rebuild trust of industrial practitioners in Active Learning.
arXiv Detail & Related papers (2020-12-18T09:34:59Z) - A Comprehensive Benchmark Framework for Active Learning Methods in
Entity Matching [17.064993611446898]
In this paper, we build a unified active learning benchmark framework for EM.
The goal of the framework is to enable concrete guidelines for practitioners as to what active learning combinations will work well for EM.
Our framework also includes novel optimizations that improve the quality of the learned model by roughly 9% in terms of F1-score and reduce example selection latencies by up to 10x without affecting the quality of the model.
arXiv Detail & Related papers (2020-03-29T19:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.