GPT-NER: Named Entity Recognition via Large Language Models
- URL: http://arxiv.org/abs/2304.10428v4
- Date: Sat, 7 Oct 2023 14:25:28 GMT
- Title: GPT-NER: Named Entity Recognition via Large Language Models
- Authors: Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei
Zhang, Jiwei Li, Guoyin Wang
- Abstract summary: GPT-NER transforms the sequence labeling task to a generation task that can be easily adapted by Language Models.
We find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce.
This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.
- Score: 58.609582116612934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the fact that large-scale Language Models (LLM) have achieved SOTA
performances on a variety of NLP tasks, its performance on NER is still
significantly below supervised baselines. This is due to the gap between the
two tasks the NER and LLMs: the former is a sequence labeling task in nature
while the latter is a text-generation model.
In this paper, we propose GPT-NER to resolve this issue. GPT-NER bridges the
gap by transforming the sequence labeling task to a generation task that can be
easily adapted by LLMs e.g., the task of finding location entities in the input
text "Columbus is a city" is transformed to generate the text sequence
"@@Columbus## is a city", where special tokens @@## marks the entity to
extract. To efficiently address the "hallucination" issue of LLMs, where LLMs
have a strong inclination to over-confidently label NULL inputs as entities, we
propose a self-verification strategy by prompting LLMs to ask itself whether
the extracted entities belong to a labeled entity tag.
We conduct experiments on five widely adopted NER datasets, and GPT-NER
achieves comparable performances to fully supervised baselines, which is the
first time as far as we are concerned. More importantly, we find that GPT-NER
exhibits a greater ability in the low-resource and few-shot setups, when the
amount of training data is extremely scarce, GPT-NER performs significantly
better than supervised models. This demonstrates the capabilities of GPT-NER in
real-world NER applications where the number of labeled examples is limited.
Related papers
- ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models [0.0]
We present ReverseNER, a framework aimed at overcoming the limitations of large language models (LLMs) in zero-shot Named Entity Recognition tasks.
Rather than beginning with sentences, this method uses an LLM to generate entities based on their definitions and then expands them into full sentences.
This results in well-annotated sentences with clearly labeled entities, while preserving semantic and structural similarity to the task sentences.
arXiv Detail & Related papers (2024-11-01T12:08:08Z) - Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels.
This study explores whether solely utilizing unlabeled data can elicit strong model capabilities.
We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data [41.94295877935867]
We show how to create NuNER, a compact language representation model specialized in the Named Entity Recognition task.
NuNER can be fine-tuned to solve downstream NER problems in a data-efficient way.
We find that the size and entity-type diversity of the pre-training dataset are key to achieving good performance.
arXiv Detail & Related papers (2024-02-23T14:23:51Z) - GLiNER: Generalist Model for Named Entity Recognition using
Bidirectional Transformer [4.194768796374315]
Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications.
In this paper, we introduce a compact NER model trained to identify any type of entity.
Our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of Large Language Models (LLMs)
arXiv Detail & Related papers (2023-11-14T20:39:12Z) - NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z) - Label-free Node Classification on Graphs with Large Language Models
(LLMS) [46.937442239949256]
This work introduces a label-free node classification on graphs with Large Language Models pipeline, LLM-GNN.
Itates the strengths of both GNNs and LLMs while mitigating their limitations.
In particular, LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset with a cost less than 1 dollar.
arXiv Detail & Related papers (2023-10-07T03:14:11Z) - Label Supervised LLaMA Finetuning [13.939718306233617]
In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs)
We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss.
Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
arXiv Detail & Related papers (2023-10-02T13:53:03Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - Gaussian Prior Reinforcement Learning for Nested Named Entity
Recognition [52.46740830977898]
We propose a novel seq2seq model named GPRL, which formulates the nested NER task as an entity triplet sequence generation process.
Experiments on three nested NER datasets demonstrate that GPRL outperforms previous nested NER models.
arXiv Detail & Related papers (2023-05-12T05:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.