Related papers: A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

URL: http://arxiv.org/abs/2109.11301v1
Date: Thu, 23 Sep 2021 11:17:50 GMT
Title: A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification
Authors: Marek Herde, Denis Huseljic, Bernhard Sick, Adrian Calma
Abstract summary: Pool-based active learning aims to optimize the annotation process. An AL strategy queries annotations intelligently from annotators to train a high-performance classification model.
Score: 1.539335655168078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.

Related papers

GLEAN: Generalized Category Discovery with Diverse and Quality-Enhanced LLM Feedback [13.969403782532957]
Generalized Category Discovery (GCD) aims to recognize both known and novel categories in unlabeled data. Previous GCD methods face significant challenges, such as difficulty in rectifying errors for confusing instances. We propose GLEAN, a unified framework for generalized category discovery.
arXiv Detail & Related papers (2025-02-25T18:11:37Z)
Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP [3.024761040393842]
Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance. We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
arXiv Detail & Related papers (2023-08-01T10:42:11Z)
Active Learning for Natural Language Generation [17.14395724301382]
We present a first systematic study of active learning for Natural Language Generation. Our results indicate that the performance of existing AL strategies is inconsistent. We highlight some notable differences between the classification and generation scenarios, and analyze the selection behaviors of existing AL strategies.
arXiv Detail & Related papers (2023-05-24T11:27:53Z)
Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization. We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z)
ImitAL: Learning Active Learning Strategies from Synthetic Data [14.758287202278918]
Active Learning is a well-known standard method for efficiently obtaining labeled data. We propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem. We show that our approach is more runtime performant than most other strategies, especially on very large datasets.
arXiv Detail & Related papers (2021-08-17T15:03:31Z)
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z)
Online Learning Demands in Max-min Fairness [91.37280766977923]
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof. The mechanism is repeated for multiple rounds and a user's requirements can change on each round. At the end of each round, users provide feedback about the allocation they received, enabling the mechanism to learn user preferences over time.
arXiv Detail & Related papers (2020-12-15T22:15:20Z)
Active Imitation Learning with Noisy Guidance [6.832341432995627]
Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state. We consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy that provides noisy guidance.
arXiv Detail & Related papers (2020-05-26T15:35:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.