Data-efficient Active Learning for Structured Prediction with Partial
Annotation and Self-Training
- URL: http://arxiv.org/abs/2305.12634v2
- Date: Thu, 19 Oct 2023 02:13:51 GMT
- Title: Data-efficient Active Learning for Structured Prediction with Partial
Annotation and Self-Training
- Authors: Zhisong Zhang, Emma Strubell, Eduard Hovy
- Abstract summary: We propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning.
Our approach leverages partial annotation, which reduces labeling costs by selecting only the most informative sub-structures for annotation.
We also utilize self-training to incorporate the current model's automatic predictions as pseudo-labels for un-annotated sub-structures.
- Score: 16.740101757982828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we propose a pragmatic method that reduces the annotation cost
for structured label spaces using active learning. Our approach leverages
partial annotation, which reduces labeling costs for structured outputs by
selecting only the most informative sub-structures for annotation. We also
utilize self-training to incorporate the current model's automatic predictions
as pseudo-labels for un-annotated sub-structures. A key challenge in
effectively combining partial annotation with self-training to reduce
annotation cost is determining which sub-structures to select to label. To
address this challenge, we adopt an error estimator to adaptively decide the
partial selection ratio according to the current model's capability. In
evaluations spanning four structured prediction tasks, we show that our
combination of partial annotation and self-training using an adaptive selection
ratio reduces annotation cost over strong full annotation baselines under a
fair comparison scheme that takes reading time into consideration.
Related papers
- Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation [4.846839863393725]
We propose Sub-SA (Submodular Selective ), a sub-module-based selective annotation method.
The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples.
We also propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset.
arXiv Detail & Related papers (2024-07-08T07:47:30Z) - Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation [2.0411082897313984]
This study introduces a novel methodology that integrates human annotators and Large Language Models.
The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels.
The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.
arXiv Detail & Related papers (2024-06-17T21:45:48Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - IDEAL: Influence-Driven Selective Annotations Empower In-Context
Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method.
It aims to minimize annotation costs while improving the quality of in-context examples.
Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z) - Full or Weak annotations? An adaptive strategy for budget-constrained
annotation campaigns [3.1318537187387787]
We propose a novel approach to determine annotation strategies for segmentation datasets.
Our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions.
We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets.
arXiv Detail & Related papers (2023-03-21T08:41:54Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - Query-Adaptive Predictive Inference with Partial Labels [0.0]
We propose a new methodology to construct predictive sets using only partially labeled data on top of black-box predictive models.
Our experiments highlight the validity of our predictive set construction as well as the attractiveness of a more flexible user-dependent loss framework.
arXiv Detail & Related papers (2022-06-15T01:48:42Z) - Active Learning with Weak Supervision for Gaussian Processes [12.408125305560274]
We propose an active learning algorithm that selects the precision of the annotation that is acquired.
We empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.
arXiv Detail & Related papers (2022-04-18T14:27:31Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.