Scoping Review of Active Learning Strategies and their Evaluation Environments for Entity Recognition Tasks
- URL: http://arxiv.org/abs/2407.03895v1
- Date: Thu, 4 Jul 2024 12:40:35 GMT
- Title: Scoping Review of Active Learning Strategies and their Evaluation Environments for Entity Recognition Tasks
- Authors: Philipp Kohl, Yoka Krämer, Claudia Fohry, Bodo Kraft,
- Abstract summary: We analyzed 62 relevant papers and identified 106 active learning strategies.
We grouped them into three categories: exploitation-based (60x), exploration-based (14x), and hybrid strategies (32x)
- Score: 0.6462260690750605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We conducted a scoping review for active learning in the domain of natural language processing (NLP), which we summarize in accordance with the PRISMA-ScR guidelines as follows: Objective: Identify active learning strategies that were proposed for entity recognition and their evaluation environments (datasets, metrics, hardware, execution time). Design: We used Scopus and ACM as our search engines. We compared the results with two literature surveys to assess the search quality. We included peer-reviewed English publications introducing or comparing active learning strategies for entity recognition. Results: We analyzed 62 relevant papers and identified 106 active learning strategies. We grouped them into three categories: exploitation-based (60x), exploration-based (14x), and hybrid strategies (32x). We found that all studies used the F1-score as an evaluation metric. Information about hardware (6x) and execution time (13x) was only occasionally included. The 62 papers used 57 different datasets to evaluate their respective strategies. Most datasets contained newspaper articles or biomedical/medical data. Our analysis revealed that 26 out of 57 datasets are publicly accessible. Conclusion: Numerous active learning strategies have been identified, along with significant open questions that still need to be addressed. Researchers and practitioners face difficulties when making data-driven decisions about which active learning strategy to adopt. Conducting comprehensive empirical comparisons using the evaluation environment proposed in this study could help establish best practices in the domain.
Related papers
- ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data [18.553222868627792]
In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled.
Numerous such query strategies have been proposed and compared in the active learning literature.
The community still lacks standardized benchmarks for comparing the performance of different query strategies.
arXiv Detail & Related papers (2024-06-25T07:14:14Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Hierarchical Point-based Active Learning for Semi-supervised Point Cloud
Semantic Segmentation [48.40853126077237]
It is labour-intensive to acquire large-scale point cloud data with point-wise labels.
Active learning is one of the effective strategies to achieve this purpose but is still under-explored.
This paper develops a hierarchical point-based active learning strategy.
arXiv Detail & Related papers (2023-08-22T03:52:05Z) - ALE: A Simulation-Based Active Learning Evaluation Framework for the
Parameter-Driven Comparison of Query Strategies for NLP [3.024761040393842]
Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample.
This method is supposed to save annotation effort while maintaining model performance.
We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
arXiv Detail & Related papers (2023-08-01T10:42:11Z) - A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors [56.554277096170246]
We present an empirical study that evaluates and contrasts four commonly employed annotation methods in user studies focused on in-the-wild data collection.
For both the user-driven, in situ annotations, where participants annotate their activities during the actual recording process, and the recall methods, where participants retrospectively annotate their data at the end of each day, the participants had the flexibility to select their own set of activity classes and corresponding labels.
arXiv Detail & Related papers (2023-05-15T16:02:56Z) - Less Is More: A Comparison of Active Learning Strategies for 3D Medical
Image Segmentation [0.0]
A variety of active learning strategies have been proposed in the literature, but their effectiveness is highly dependent on the dataset and training scenario.
We evaluate the performance of several well-known active learning strategies on three datasets from the Medical Decathlon.
arXiv Detail & Related papers (2022-07-02T14:27:58Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - Rebuilding Trust in Active Learning with Actionable Metrics [77.99796068970569]
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs.
This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets.
We present various actionable metrics to help rebuild trust of industrial practitioners in Active Learning.
arXiv Detail & Related papers (2020-12-18T09:34:59Z) - Learning active learning at the crossroads? evaluation and discussion [0.03807314298073299]
Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label.
There is no best active learning strategy that consistently outperforms all others in all applications.
We present the results of a benchmark performed on 20 datasets that compares a strategy learned using a recent meta-learning algorithm with margin sampling.
arXiv Detail & Related papers (2020-12-16T10:35:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.