UniversalNER: Targeted Distillation from Large Language Models for Open
Named Entity Recognition
- URL: http://arxiv.org/abs/2308.03279v2
- Date: Fri, 19 Jan 2024 02:26:38 GMT
- Title: UniversalNER: Targeted Distillation from Large Language Models for Open
Named Entity Recognition
- Authors: Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon
- Abstract summary: We show how ChatGPT can be distilled into much smaller UniversalNER models for open NER.
We assemble the largest NER benchmark to date, comprising 43 datasets across 9 diverse domains.
With a tiny fraction of parameters, UniversalNER not only acquires ChatGPT's capability in recognizing arbitrary entity types, but also outperforms its NER accuracy by 7-9 absolute F1 points in average.
- Score: 48.977866466971655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated remarkable generalizability,
such as understanding arbitrary entities and relations. Instruction tuning has
proven effective for distilling LLMs into more cost-efficient models such as
Alpaca and Vicuna. Yet such student models still trail the original LLMs by
large margins in downstream applications. In this paper, we explore targeted
distillation with mission-focused instruction tuning to train student models
that can excel in a broad application class such as open information
extraction. Using named entity recognition (NER) for case study, we show how
ChatGPT can be distilled into much smaller UniversalNER models for open NER.
For evaluation, we assemble the largest NER benchmark to date, comprising 43
datasets across 9 diverse domains such as biomedicine, programming, social
media, law, finance. Without using any direct supervision, UniversalNER attains
remarkable NER accuracy across tens of thousands of entity types, outperforming
general instruction-tuned models such as Alpaca and Vicuna by over 30 absolute
F1 points in average. With a tiny fraction of parameters, UniversalNER not only
acquires ChatGPT's capability in recognizing arbitrary entity types, but also
outperforms its NER accuracy by 7-9 absolute F1 points in average. Remarkably,
UniversalNER even outperforms by a large margin state-of-the-art multi-task
instruction-tuned systems such as InstructUIE, which uses supervised NER
examples. We also conduct thorough ablation studies to assess the impact of
various components in our distillation approach. We release the distillation
recipe, data, and UniversalNER models to facilitate future research on targeted
distillation.
Related papers
- Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models [0.0]
Few-Shot Prompting or in-context learning enables models to recognize entities with minimal examples.
We assess state-of-the-art models like GPT-4 in NER tasks, comparing their few-shot performance to fully supervised benchmarks.
arXiv Detail & Related papers (2024-08-28T13:42:28Z) - FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios [0.5106912532044251]
We introduce FsPONER, a novel approach for optimizing few-shot prompts, and evaluate its performance on domain-specific NER datasets.
FsPONER consists of three few-shot selection methods based on random sampling, TF-IDF, and a combination of both.
In the considered real-world scenarios with data scarcity, FsPONER with TF-IDF surpasses fine-tuned models by approximately 10% in F1 score.
arXiv Detail & Related papers (2024-07-10T20:32:50Z) - GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks [0.0]
We will introduce a new kind of GLiNER model that can be used for various information extraction tasks while being a small encoder model.
Our model achieved SoTA performance on zero-shot NER benchmarks and leading performance on question-answering, summarization and relation extraction tasks.
arXiv Detail & Related papers (2024-06-14T13:54:29Z) - GLiNER: Generalist Model for Named Entity Recognition using
Bidirectional Transformer [4.194768796374315]
Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications.
In this paper, we introduce a compact NER model trained to identify any type of entity.
Our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of Large Language Models (LLMs)
arXiv Detail & Related papers (2023-11-14T20:39:12Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - AutoTriggER: Label-Efficient and Robust Named Entity Recognition with
Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers''
Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique.
AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.