Related papers: ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

URL: http://arxiv.org/abs/2308.02537v1
Date: Tue, 1 Aug 2023 10:42:11 GMT
Title: ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP
Authors: Philipp Kohl and Nils Freyer and Yoka Kr\"amer and Henri Werth and Steffen Wolf and Bodo Kraft and Matthias Meinecke and Albert Z\"undorf
Abstract summary: Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance. We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
Score: 3.024761040393842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised machine learning and deep learning require a large amount of labeled data, which data scientists obtain in a manual, and time-consuming annotation process. To mitigate this challenge, Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance. However, practitioners face many AL strategies for different tasks and need an empirical basis to choose between them. Surveys categorize AL strategies into taxonomies without performance indications. Presentations of novel AL strategies compare the performance to a small subset of strategies. Our contribution addresses the empirical basis by introducing a reproducible active learning evaluation (ALE) framework for the comparative evaluation of AL strategies in NLP. The framework allows the implementation of AL strategies with low effort and a fair data-driven comparison through defining and tracking experiment parameters (e.g., initial dataset size, number of data points per query step, and the budget). ALE helps practitioners to make more informed decisions, and researchers can focus on developing new, effective AL strategies and deriving best practices for specific use cases. With best practices, practitioners can lower their annotation costs. We present a case study to illustrate how to use the framework.

Related papers

To Label or Not to Label: PALM -- A Predictive Model for Evaluating Sample Efficiency in Active Learning Models [2.2667044928324747]
Active learning (AL) seeks to reduce annotation costs by selecting the most informative samples for labeling.<n>Traditional evaluation methods, which focus solely on final accuracy, fail to capture the full dynamics of the learning process.<n>We propose PALM, a unified and interpretable mathematical model that characterizes AL trajectories through four key parameters.
arXiv Detail & Related papers (2025-07-21T08:37:44Z)
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data [18.553222868627792]
In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled. Numerous such query strategies have been proposed and compared in the active learning literature. The community still lacks standardized benchmarks for comparing the performance of different query strategies.
arXiv Detail & Related papers (2024-06-25T07:14:14Z)
Parameter-Efficient Active Learning for Foundational models [7.799711162530711]
Foundational vision transformer models have shown impressive few shot performance on many vision tasks. This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework.
arXiv Detail & Related papers (2024-06-13T16:30:32Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization. We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z)
An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain. We use knowledge distillation to guide the model's embeddings to a semantically meaningful space. Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z)
Active Pointly-Supervised Instance Segmentation [106.38955769817747]
We present an economic active learning setting, named active pointly-supervised instance segmentation (APIS) APIS starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object. The model developed with these strategies yields consistent performance gain on the challenging MS-COCO dataset.
arXiv Detail & Related papers (2022-07-23T11:25:24Z)
Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples. A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels. We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z)
Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning [113.05118113697111]
Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data. Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different few-shot learning scenarios. We present Meta Navigator, a framework that attempts to solve the limitation in few-shot learning by seeking a higher-level strategy.
arXiv Detail & Related papers (2021-09-13T07:20:01Z)
Cartography Active Learning [12.701925701095968]
We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm. CAL exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. Our results show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.
arXiv Detail & Related papers (2021-09-09T14:02:02Z)
Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization. We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.