An Empirical Study on the Efficacy of Deep Active Learning for Image
Classification
- URL: http://arxiv.org/abs/2212.03088v1
- Date: Wed, 30 Nov 2022 17:44:59 GMT
- Title: An Empirical Study on the Efficacy of Deep Active Learning for Image
Classification
- Authors: Yu Li, Muxi Chen, Yannan Liu, Daojing He, and Qiang Xu
- Abstract summary: Deep Active Learning (DAL) has been advocated as a promising method to reduce labeling costs in supervised learning.
Existing evaluations of DAL methods are based on different settings, and their results are controversial.
This paper comprehensively evaluates 19 existing DAL methods in a uniform setting.
- Score: 11.398892277968427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Active Learning (DAL) has been advocated as a promising method to reduce
labeling costs in supervised learning. However, existing evaluations of DAL
methods are based on different settings, and their results are controversial.
To tackle this issue, this paper comprehensively evaluates 19 existing DAL
methods in a uniform setting, including traditional
fully-\underline{s}upervised \underline{a}ctive \underline{l}earning (SAL)
strategies and emerging \underline{s}emi-\underline{s}upervised
\underline{a}ctive \underline{l}earning (SSAL) techniques. We have several
non-trivial findings. First, most SAL methods cannot achieve higher accuracy
than random selection. Second, semi-supervised training brings significant
performance improvement compared to pure SAL methods. Third, performing data
selection in the SSAL setting can achieve a significant and consistent
performance improvement, especially with abundant unlabeled data. Our findings
produce the following guidance for practitioners: one should (i) apply SSAL
early and (ii) collect more unlabeled data whenever possible, for better model
performance.
Related papers
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Benchmarking of Query Strategies: Towards Future Deep Active Learning [0.0]
We benchmark query strategies for deep actice learning(DAL)
DAL reduces annotation costs by annotating only high-quality samples selected by query strategies.
arXiv Detail & Related papers (2023-12-10T04:17:16Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - TiDAL: Learning Training Dynamics for Active Learning [10.832194534164142]
We propose Training Dynamics for Active Learning (TiDAL) to quantify uncertainties of unlabeled data.
Since tracking the TD of all the large-scale unlabeled data is impractical, TiDAL utilizes an additional prediction module that learns the TD of labeled data.
Our TiDAL achieves better or comparable performance on both balanced and imbalanced benchmark datasets compared to state-of-the-art AL methods.
arXiv Detail & Related papers (2022-10-13T06:54:50Z) - A Comparative Survey of Deep Active Learning [76.04825433362709]
Active Learning (AL) is a set of techniques for reducing labeling cost by sequentially selecting data samples from a large unlabeled data pool for labeling.
Deep Learning (DL) is data-hungry, and the performance of DL models scales monotonically with more training data.
In recent years, Deep Active Learning (DAL) has risen as feasible solutions for maximizing model performance while minimizing the expensive labeling cost.
arXiv Detail & Related papers (2022-03-25T05:17:24Z) - Revisiting the Performance of iALS on Item Recommendation Benchmarks [19.704506591363256]
Matrix factorization learned by implicit alternating least squares (iALS) is a popular baseline in recommender system research publications.
Recent studies suggest that its prediction quality is not competitive with the current state of the art.
We revisit four well-studied benchmarks where iALS was reported to perform poorly and show that with proper tuning, iALS is highly competitive.
arXiv Detail & Related papers (2021-10-26T21:30:57Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained
Classification [38.68079253627819]
Our benchmark consists of two fine-grained classification datasets obtained by sampling classes from the Aves and Fungi taxonomy.
We find that recently proposed SSL methods provide significant benefits, and can effectively use out-of-class data to improve performance when deep networks are trained from scratch.
Our work suggests that semi-supervised learning with experts on realistic datasets may require different strategies than those currently prevalent in the literature.
arXiv Detail & Related papers (2021-04-01T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.