A Cross-Domain Benchmark for Active Learning
- URL: http://arxiv.org/abs/2408.00426v2
- Date: Tue, 12 Nov 2024 13:41:47 GMT
- Title: A Cross-Domain Benchmark for Active Learning
- Authors: Thorben Werner, Johannes Burchert, Maximilian Stubbemann, Lars Schmidt-Thieme,
- Abstract summary: Active Learning deals with identifying the most informative samples for labeling to reduce data annotation costs.
We propose CDALBench, the first active learning benchmark which includes tasks in computer vision and natural language processing.
We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research.
- Score: 5.359176539960004
- License:
- Abstract: Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose CDALBench, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, CDALBench can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established method's performance can be significantly better and significantly worse than random for the same dataset.
Related papers
- Towards Comparable Active Learning [6.579888565581481]
We show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research.
This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and perform oracleant algorithm for evaluation.
arXiv Detail & Related papers (2023-11-30T08:54:32Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Active Teacher for Semi-Supervised Object Detection [80.10937030195228]
We propose a novel algorithm called Active Teacher for semi-supervised object detection (SSOD)
Active Teacher extends the teacher-student framework to an iterative version, where the label set is partially and gradually augmented by evaluating three key factors of unlabeled examples.
With this design, Active Teacher can maximize the effect of limited label information while improving the quality of pseudo-labels.
arXiv Detail & Related papers (2023-03-15T03:59:27Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Effective Evaluation of Deep Active Learning on Image Classification
Tasks [10.27095298129151]
We present a unified re-implementation of state-of-the-art active learning algorithms in the context of image classification.
On the positive side, we show that AL techniques are 2x to 4x more label-efficient compared to RS with the use of data augmentation.
arXiv Detail & Related papers (2021-06-16T23:29:39Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - Towards Few-Shot Fact-Checking via Perplexity [40.11397284006867]
We propose a new way of utilizing the powerful transfer learning ability of a language model via a perplexity score.
Our methodology can already outperform the Major Class baseline by more than absolute 10% on the F1-Macro metric.
We construct and publicly release two new fact-checking datasets related to COVID-19.
arXiv Detail & Related papers (2021-03-17T09:43:19Z) - Semi-supervised Active Learning for Instance Segmentation via Scoring
Predictions [25.408505612498423]
We propose a novel and principled semi-supervised active learning framework for instance segmentation.
Specifically, we present an uncertainty sampling strategy named Triplet Scoring Predictions (TSP) to explicitly incorporate samples ranking clues from classes, bounding boxes and masks.
Results on medical images datasets demonstrate that the proposed method results in the embodiment of knowledge from available data in a meaningful way.
arXiv Detail & Related papers (2020-12-09T02:36:52Z) - ALdataset: a benchmark for pool-based active learning [1.9308522511657449]
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points.
Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain.
We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.
arXiv Detail & Related papers (2020-10-16T04:37:29Z) - Multi-Task Incremental Learning for Object Detection [71.57155077119839]
Multi-task learns multiple tasks, while sharing knowledge and computation among them.
It suffers from catastrophic forgetting of previous knowledge when learned incrementally without access to the old data.
arXiv Detail & Related papers (2020-02-13T04:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.