A Survey on Cost Types, Interaction Schemes, and Annotator Performance
Models in Selection Algorithms for Active Learning in Classification
- URL: http://arxiv.org/abs/2109.11301v1
- Date: Thu, 23 Sep 2021 11:17:50 GMT
- Title: A Survey on Cost Types, Interaction Schemes, and Annotator Performance
Models in Selection Algorithms for Active Learning in Classification
- Authors: Marek Herde, Denis Huseljic, Bernhard Sick, Adrian Calma
- Abstract summary: Pool-based active learning aims to optimize the annotation process.
An AL strategy queries annotations intelligently from annotators to train a high-performance classification model.
- Score: 1.539335655168078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pool-based active learning (AL) aims to optimize the annotation process
(i.e., labeling) as the acquisition of annotations is often time-consuming and
therefore expensive. For this purpose, an AL strategy queries annotations
intelligently from annotators to train a high-performance classification model
at a low annotation cost. Traditional AL strategies operate in an idealized
framework. They assume a single, omniscient annotator who never gets tired and
charges uniformly regardless of query difficulty. However, in real-world
applications, we often face human annotators, e.g., crowd or in-house workers,
who make annotation mistakes and can be reluctant to respond if tired or faced
with complex queries. Recently, a wide range of novel AL strategies has been
proposed to address these issues. They differ in at least one of the following
three central aspects from traditional AL: (1) They explicitly consider
(multiple) human annotators whose performances can be affected by various
factors, such as missing expertise. (2) They generalize the interaction with
human annotators by considering different query and annotation types, such as
asking an annotator for feedback on an inferred classification rule. (3) They
take more complex cost schemes regarding annotations and misclassifications
into account. This survey provides an overview of these AL strategies and
refers to them as real-world AL. Therefore, we introduce a general real-world
AL strategy as part of a learning cycle and use its elements, e.g., the query
and annotator selection algorithm, to categorize about 60 real-world AL
strategies. Finally, we outline possible directions for future research in the
field of AL.
Related papers
- Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal.
Existing active retrieval methods face two challenges: 1.
They usually rely on a single criterion, which struggles with handling various types of instructions.
They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - ALE: A Simulation-Based Active Learning Evaluation Framework for the
Parameter-Driven Comparison of Query Strategies for NLP [3.024761040393842]
Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample.
This method is supposed to save annotation effort while maintaining model performance.
We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
arXiv Detail & Related papers (2023-08-01T10:42:11Z) - Active Learning for Natural Language Generation [17.14395724301382]
We present a first systematic study of active learning for Natural Language Generation.
Our results indicate that the performance of existing AL strategies is inconsistent.
We highlight some notable differences between the classification and generation scenarios, and analyze the selection behaviors of existing AL strategies.
arXiv Detail & Related papers (2023-05-24T11:27:53Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - ImitAL: Learning Active Learning Strategies from Synthetic Data [14.758287202278918]
Active Learning is a well-known standard method for efficiently obtaining labeled data.
We propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem.
We show that our approach is more runtime performant than most other strategies, especially on very large datasets.
arXiv Detail & Related papers (2021-08-17T15:03:31Z) - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy.
We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.
Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z) - Online Learning Demands in Max-min Fairness [91.37280766977923]
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof.
The mechanism is repeated for multiple rounds and a user's requirements can change on each round.
At the end of each round, users provide feedback about the allocation they received, enabling the mechanism to learn user preferences over time.
arXiv Detail & Related papers (2020-12-15T22:15:20Z) - Active Imitation Learning with Noisy Guidance [6.832341432995627]
Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks.
Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state.
We consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy that provides noisy guidance.
arXiv Detail & Related papers (2020-05-26T15:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.