Focusing on Potential Named Entities During Active Label Acquisition
- URL: http://arxiv.org/abs/2111.03837v3
- Date: Wed, 14 Jun 2023 00:32:17 GMT
- Title: Focusing on Potential Named Entities During Active Label Acquisition
- Authors: Ali Osman Berk Sapci, Oznur Tastan, Reyyan Yeniterzi
- Abstract summary: Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text.
Many domain-specific NER applications still call for a substantial amount of labeled data.
We propose a better data-driven normalization approach to penalize sentences that are too long or too short.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Named entity recognition (NER) aims to identify mentions of named entities in
an unstructured text and classify them into predefined named entity classes.
While deep learning-based pre-trained language models help to achieve good
predictive performances in NER, many domain-specific NER applications still
call for a substantial amount of labeled data. Active learning (AL), a general
framework for the label acquisition problem, has been used for NER tasks to
minimize the annotation cost without sacrificing model performance. However,
the heavily imbalanced class distribution of tokens introduces challenges in
designing effective AL querying methods for NER. We propose several AL sentence
query evaluation functions that pay more attention to potential positive
tokens, and evaluate these proposed functions with both sentence-based and
token-based cost evaluation strategies. We also propose a better data-driven
normalization approach to penalize sentences that are too long or too short.
Our experiments on three datasets from different domains reveal that the
proposed approach reduces the number of annotated tokens while achieving better
or comparable prediction performance with conventional methods.
Related papers
- Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models [0.0]
Few-Shot Prompting or in-context learning enables models to recognize entities with minimal examples.
We assess state-of-the-art models like GPT-4 in NER tasks, comparing their few-shot performance to fully supervised benchmarks.
arXiv Detail & Related papers (2024-08-28T13:42:28Z) - A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition [6.468625143772815]
We propose a unified label-aware token-level contrastive learning framework.
Our approach enriches the context by utilizing label semantics as suffix prompts.
It simultaneously optimize context-native and context-label contrastive learning objectives.
arXiv Detail & Related papers (2024-04-26T06:19:21Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - Revisiting Sparse Retrieval for Few-shot Entity Linking [33.15662306409253]
We propose an ELECTRA-based keyword extractor to denoise the mention context and construct a better query expression.
For training the extractor, we propose a distant supervision method to automatically generate training data based on overlapping tokens between mention contexts and entity descriptions.
Experimental results on the ZESHEL dataset demonstrate that the proposed method outperforms state-of-the-art models by a significant margin across all test domains.
arXiv Detail & Related papers (2023-10-19T03:51:10Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - PromptNER: A Prompting Method for Few-shot Named Entity Recognition via
k Nearest Neighbor Search [56.81939214465558]
We propose PromptNER: a novel prompting method for few-shot NER via k nearest neighbor search.
We use prompts that contains entity category information to construct label prototypes, which enables our model to fine-tune with only the support set.
Our approach achieves excellent transfer learning ability, and extensive experiments on the Few-NERD and CrossNER datasets demonstrate that our model achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-05-20T15:47:59Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Global Pointer: Novel Efficient Span-based Approach for Named Entity
Recognition [7.226094340165499]
Named entity recognition (NER) task aims at identifying entities from a piece of text that belong to predefined semantic types.
The state-of-the-art solutions for flat entities NER commonly suffer from capturing the fine-grained semantic information in underlying texts.
We propose a novel span-based NER framework, namely Global Pointer (GP), that leverages the relative positions through a multiplicative attention mechanism.
arXiv Detail & Related papers (2022-08-05T09:19:46Z) - Active Pointly-Supervised Instance Segmentation [106.38955769817747]
We present an economic active learning setting, named active pointly-supervised instance segmentation (APIS)
APIS starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object.
The model developed with these strategies yields consistent performance gain on the challenging MS-COCO dataset.
arXiv Detail & Related papers (2022-07-23T11:25:24Z) - Named Entity Recognition without Labelled Data: A Weak Supervision
Approach [23.05371427663683]
This paper presents a simple but powerful approach to learn NER models in the absence of labelled data through weak supervision.
The approach relies on a broad spectrum of labelling functions to automatically annotate texts from the target domain.
A sequence labelling model can finally be trained on the basis of this unified annotation.
arXiv Detail & Related papers (2020-04-30T12:29:55Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.