Active Learning at the ImageNet Scale
- URL: http://arxiv.org/abs/2111.12880v1
- Date: Thu, 25 Nov 2021 02:48:51 GMT
- Title: Active Learning at the ImageNet Scale
- Authors: Zeyad Ali Sami Emam, Hong-Min Chu, Ping-Yeh Chiang, Wojciech Czaja,
Richard Leapman, Micah Goldblum, Tom Goldstein
- Abstract summary: In this work, we study a combination of active learning (AL) and pretraining (SSP) on ImageNet.
We find that performance on small toy datasets is not representative of performance on ImageNet due to the class imbalanced samples selected by an active learner.
We propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently.
- Score: 43.595076693347835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Active learning (AL) algorithms aim to identify an optimal subset of data for
annotation, such that deep neural networks (DNN) can achieve better performance
when trained on this labeled subset. AL is especially impactful in industrial
scale settings where data labeling costs are high and practitioners use every
tool at their disposal to improve model performance. The recent success of
self-supervised pretraining (SSP) highlights the importance of harnessing
abundant unlabeled data to boost model performance. By combining AL with SSP,
we can make use of unlabeled data while simultaneously labeling and training on
particularly informative samples.
In this work, we study a combination of AL and SSP on ImageNet. We find that
performance on small toy datasets -- the typical benchmark setting in the
literature -- is not representative of performance on ImageNet due to the class
imbalanced samples selected by an active learner. Among the existing baselines
we test, popular AL algorithms across a variety of small and large scale
settings fail to outperform random sampling. To remedy the class-imbalance
problem, we propose Balanced Selection (BASE), a simple, scalable AL algorithm
that outperforms random sampling consistently by selecting more balanced
samples for annotation than existing methods. Our code is available at:
https://github.com/zeyademam/active_learning .
Related papers
- Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - MoBYv2AL: Self-supervised Active Learning for Image Classification [57.4372176671293]
We present MoBYv2AL, a novel self-supervised active learning framework for image classification.
Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline.
We achieve state-of-the-art results when compared to recent AL methods.
arXiv Detail & Related papers (2023-01-04T10:52:02Z) - Active Transfer Prototypical Network: An Efficient Labeling Algorithm
for Time-Series Data [1.7205106391379026]
This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations.
This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset.
The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
arXiv Detail & Related papers (2022-09-28T16:14:40Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Pareto Optimization for Active Learning under Out-of-Distribution Data
Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool.
Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.