BAL: Balancing Diversity and Novelty for Active Learning
- URL: http://arxiv.org/abs/2312.15944v1
- Date: Tue, 26 Dec 2023 08:14:46 GMT
- Title: BAL: Balancing Diversity and Novelty for Active Learning
- Authors: Jingyao Li, Pengguang Chen, Shaozuo Yu, Shu Liu, and Jiaya Jia
- Abstract summary: We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
- Score: 53.289700543331925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of Active Learning is to strategically label a subset of the
dataset to maximize performance within a predetermined labeling budget. In this
study, we harness features acquired through self-supervised learning. We
introduce a straightforward yet potent metric, Cluster Distance Difference, to
identify diverse data. Subsequently, we introduce a novel framework, Balancing
Active Learning (BAL), which constructs adaptive sub-pools to balance diverse
and uncertain data. Our approach outperforms all established active learning
methods on widely recognized benchmarks by 1.20%. Moreover, we assess the
efficacy of our proposed framework under extended settings, encompassing both
larger and smaller labeling budgets. Experimental results demonstrate that,
when labeling 80% of the samples, the performance of the current SOTA method
declines by 0.74%, whereas our proposed BAL achieves performance comparable to
the full dataset. Codes are available at https://github.com/JulietLJY/BAL.
Related papers
- Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement [8.509688686402438]
Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities.
This work addresses the question: How can we determine the optimal subset of data for effective training?
Our method employs k-means clustering to ensure the selected subset effectively represents the full dataset.
arXiv Detail & Related papers (2024-09-17T17:25:31Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Revisiting Deep Active Learning for Semantic Segmentation [37.3546941940388]
We show that the data distribution is decisive for the performance of the various active learning objectives proposed in the literature.
We demonstrate that the integration of semi-supervised learning with active learning can improve performance when the two objectives are aligned.
arXiv Detail & Related papers (2023-02-08T14:23:37Z) - Improving Robustness and Efficiency in Active Learning with Contrastive
Loss [13.994967246046008]
This paper introduces supervised contrastive active learning (SCAL) by leveraging the contrastive loss for active learning in a supervised setting.
We propose efficient query strategies in active learning to select unbiased and informative data samples of diverse feature representations.
arXiv Detail & Related papers (2021-09-13T21:09:21Z) - Rebuilding Trust in Active Learning with Actionable Metrics [77.99796068970569]
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs.
This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets.
We present various actionable metrics to help rebuild trust of industrial practitioners in Active Learning.
arXiv Detail & Related papers (2020-12-18T09:34:59Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z) - A Comprehensive Benchmark Framework for Active Learning Methods in
Entity Matching [17.064993611446898]
In this paper, we build a unified active learning benchmark framework for EM.
The goal of the framework is to enable concrete guidelines for practitioners as to what active learning combinations will work well for EM.
Our framework also includes novel optimizations that improve the quality of the learned model by roughly 9% in terms of F1-score and reduce example selection latencies by up to 10x without affecting the quality of the model.
arXiv Detail & Related papers (2020-03-29T19:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.