Related papers: Active Learning at the ImageNet Scale

Active Learning at the ImageNet Scale

URL: http://arxiv.org/abs/2111.12880v1
Date: Thu, 25 Nov 2021 02:48:51 GMT
Title: Active Learning at the ImageNet Scale
Authors: Zeyad Ali Sami Emam, Hong-Min Chu, Ping-Yeh Chiang, Wojciech Czaja, Richard Leapman, Micah Goldblum, Tom Goldstein
Abstract summary: In this work, we study a combination of active learning (AL) and pretraining (SSP) on ImageNet. We find that performance on small toy datasets is not representative of performance on ImageNet due to the class imbalanced samples selected by an active learner. We propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently.
Score: 43.595076693347835
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-supervised pretraining (SSP) highlights the importance of harnessing abundant unlabeled data to boost model performance. By combining AL with SSP, we can make use of unlabeled data while simultaneously labeling and training on particularly informative samples. In this work, we study a combination of AL and SSP on ImageNet. We find that performance on small toy datasets -- the typical benchmark setting in the literature -- is not representative of performance on ImageNet due to the class imbalanced samples selected by an active learner. Among the existing baselines we test, popular AL algorithms across a variety of small and large scale settings fail to outperform random sampling. To remedy the class-imbalance problem, we propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently by selecting more balanced samples for annotation than existing methods. Our code is available at: https://github.com/zeyademam/active_learning .

Related papers

Subsampling Graphs with GNN Performance Guarantees [34.32848091746629]
We introduce new subsampling methods for graph datasets. We prove that training a GNN on the subsampled data results in a bounded increase in loss compared to training on the full dataset.
arXiv Detail & Related papers (2025-02-23T20:21:16Z)
Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z)
Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions. We propose a novel active learning based adaptive sampling strategy to optimize the sample selection. Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z)
Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner. A mean-teacher model is then employed to correct labels of noisy samples. We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
MoBYv2AL: Self-supervised Active Learning for Image Classification [57.4372176671293]
We present MoBYv2AL, a novel self-supervised active learning framework for image classification. Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline. We achieve state-of-the-art results when compared to recent AL methods.
arXiv Detail & Related papers (2023-01-04T10:52:02Z)
Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data [1.7205106391379026]
This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations. This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset. The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
arXiv Detail & Related papers (2022-09-28T16:14:40Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z)
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning. Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.