Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort
- URL: http://arxiv.org/abs/2112.11914v1
- Date: Wed, 15 Dec 2021 13:14:58 GMT
- Title: Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort
- Authors: Franziska Weeber and Felix Hamborg and Karsten Donnay and Bela Gipp
- Abstract summary: We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
- Score: 9.379650501033465
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large amounts of annotated data have become more important than ever,
especially since the rise of deep learning techniques. However, manual
annotations are costly. We propose a tool that enables researchers to create
large, high-quality, annotated datasets with only a few manual annotations,
thus strongly reducing annotation cost and effort. For this purpose, we combine
an active learning (AL) approach with a pre-trained language model to
semi-automatically identify annotation categories in the given text documents.
To highlight our research direction's potential, we evaluate the approach on
the task of identifying frames in news articles. Our preliminary results show
that employing AL strongly reduces the number of annotations for correct
classification of even these complex and subtle frames. On the framing dataset,
the AL approach needs only 16.3\% of the annotations to reach the same
performance as a model trained on the full dataset.
Related papers
- Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation [2.0411082897313984]
This study introduces a novel methodology that integrates human annotators and Large Language Models.
The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels.
The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.
arXiv Detail & Related papers (2024-06-17T21:45:48Z) - Zero-shot Active Learning Using Self Supervised Learning [11.28415437676582]
We propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process.
We aim to leverage self-supervised learnt features for the task of Active Learning.
arXiv Detail & Related papers (2024-01-03T11:49:07Z) - A Self Supervised StyleGAN for Image Annotation and Classification with
Extremely Limited Labels [35.43549147657739]
We propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets.
We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10.
arXiv Detail & Related papers (2023-12-26T09:46:50Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.