Related papers: Learning In-context Learning for Named Entity Recognition

Learning In-context Learning for Named Entity Recognition

URL: http://arxiv.org/abs/2305.11038v3
Date: Fri, 26 May 2023 05:44:00 GMT
Title: Learning In-context Learning for Named Entity Recognition
Authors: Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao, Xianpei Han and Le Sun
Abstract summary: Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. This paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs. We show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts.
Score: 54.022036267886214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demonstrative instances. Specifically, we model PLMs as a meta-function $\mathcal{ \lambda_ {\text{instruction, demonstrations, text}}. M}$, and a new entity extractor can be implicitly constructed by applying new instruction and demonstrations to PLMs, i.e., $\mathcal{ (\lambda . M) }$(instruction, demonstrations) $\to$ $\mathcal{F}$ where $\mathcal{F}$ will be a new entity extractor, i.e., $\mathcal{F}$: text $\to$ entities. To inject the above in-context NER ability into PLMs, we propose a meta-function pre-training algorithm, which pre-trains PLMs by comparing the (instruction, demonstration)-initialized extractor with a surrogate golden extractor. Experimental results on 4 few-shot NER datasets show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts.

Related papers

$\ exttt{SEM-CTRL}$: Semantically Controlled Decoding [53.86639808659575]
$texttSEM-CTRL$ is a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder.<n>texttSEM-CTRL$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models.
arXiv Detail & Related papers (2025-03-03T18:33:46Z)
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data [41.94295877935867]
We show how to create NuNER, a compact language representation model specialized in the Named Entity Recognition task. NuNER can be fine-tuned to solve downstream NER problems in a data-efficient way. We find that the size and entity-type diversity of the pre-training dataset are key to achieving good performance.
arXiv Detail & Related papers (2024-02-23T14:23:51Z)
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z)
Named Entity Recognition via Machine Reading Comprehension: A Multi-Task Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types. We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z)
GPT-NER: Named Entity Recognition via Large Language Models [58.609582116612934]
GPT-NER transforms the sequence labeling task to a generation task that can be easily adapted by Language Models. We find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce. This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.
arXiv Detail & Related papers (2023-04-20T16:17:26Z)
Representation Deficiency in Masked Language Modeling [107.39136254013042]
We propose MAE-LM, which pretrains the Masked Autoencoder architecture with where $tt[MASK]$ tokens are excluded from the encoder. We show that MAE-LM consistently outperforms pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2023-02-04T01:54:17Z)
Prompt-based Text Entailment for Low-Resource Named Entity Recognition [21.017890579840145]
We propose Prompt-based Text Entailment (PTE) for low-resource named entity recognition. The proposed method achieves competitive performance on the CoNLL03 dataset.
arXiv Detail & Related papers (2022-11-06T06:13:38Z)
Decomposed Meta-Learning for Few-Shot Named Entity Recognition [32.515795881027074]
Few-shot named entity recognition (NER) systems aim at recognizing novel-class named entities based on only a few labeled examples. We present a meta-learning approach which tackles few-shot span detection and few-shot entity typing using meta-learning.
arXiv Detail & Related papers (2022-04-12T12:46:23Z)
Truth Discovery in Sequence Labels from Crowds [12.181422057560201]
Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose. Existing literature in annotation aggregation assumes that annotations are independent and thus faces challenges when handling the sequential label aggregation tasks. We propose an optimization-based method that infers the ground truth labels using annotations provided by workers for sequential labeling tasks.
arXiv Detail & Related papers (2021-09-09T19:12:13Z)
How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm. We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.