NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval
- URL: http://arxiv.org/abs/2310.14282v1
- Date: Sun, 22 Oct 2023 12:23:00 GMT
- Title: NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval
- Authors: Uri Katz, Matan Vetzler, Amir DN Cohen, Yoav Goldberg
- Abstract summary: We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
- Score: 49.827932299460514
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recognizing entities in texts is a central need in many information-seeking
scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the
most successful examples of a widely adopted NLP task and corresponding NLP
technology. Recent advances in large language models (LLMs) appear to provide
effective solutions (also) for NER tasks that were traditionally handled with
dedicated models, often matching or surpassing the abilities of the dedicated
models. Should NER be considered a solved problem? We argue to the contrary:
the capabilities provided by LLMs are not the end of NER research, but rather
an exciting beginning. They allow taking NER to the next level, tackling
increasingly more useful, and increasingly more challenging, variants. We
present three variants of the NER task, together with a dataset to support
them. The first is a move towards more fine-grained -- and intersectional --
entity types. The second is a move towards zero-shot recognition and extraction
of these fine-grained types based on entity-type labels. The third, and most
challenging, is the move from the recognition setup to a novel retrieval setup,
where the query is a zero-shot entity type, and the expected result is all the
sentences from a large, pre-indexed corpus that contain entities of these
types, and their corresponding spans. We show that all of these are far from
being solved. We provide a large, silver-annotated corpus of 4 million
paragraphs covering 500 entity types, to facilitate research towards all of
these three goals.
Related papers
- In-Context Learning for Few-Shot Nested Named Entity Recognition [53.55310639969833]
We introduce an effective and innovative ICL framework for the setting of few-shot nested NER.
We improve the ICL prompt by devising a novel example demonstration selection mechanism, EnDe retriever.
In EnDe retriever, we employ contrastive learning to perform three types of representation learning, in terms of semantic similarity, boundary similarity, and label similarity.
arXiv Detail & Related papers (2024-02-02T06:57:53Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - PromptNER: Prompting For Named Entity Recognition [27.501500279749475]
We introduce PromptNER, a new state-of-the-art algorithm for few-Shot and cross-domain NER.
PromptNER achieves a 4% (absolute) improvement in F1 score on the ConLL dataset, a 9% (absolute) improvement on the GENIA dataset, and a 4% (absolute) improvement on the FewNERD dataset.
arXiv Detail & Related papers (2023-05-24T07:38:24Z) - PromptNER: A Prompting Method for Few-shot Named Entity Recognition via
k Nearest Neighbor Search [56.81939214465558]
We propose PromptNER: a novel prompting method for few-shot NER via k nearest neighbor search.
We use prompts that contains entity category information to construct label prototypes, which enables our model to fine-tune with only the support set.
Our approach achieves excellent transfer learning ability, and extensive experiments on the Few-NERD and CrossNER datasets demonstrate that our model achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-05-20T15:47:59Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Dynamic Named Entity Recognition [5.9401550252715865]
We introduce a new task: Dynamic Named Entity Recognition (DNER)
DNER provides a framework to better evaluate the ability of algorithms to extract entities by exploiting the context.
We evaluate baseline models and present experiments reflecting issues and research axes related to this novel task.
arXiv Detail & Related papers (2023-02-16T15:50:02Z) - Unified Named Entity Recognition as Word-Word Relation Classification [25.801945832005504]
We present a novel alternative by modeling the unified NER as word-word relation classification, namely W2NER.
The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words.
Based on the W2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs.
arXiv Detail & Related papers (2021-12-19T06:11:07Z) - A Sequence-to-Set Network for Nested Named Entity Recognition [38.05786148160635]
We propose a novel sequence-to-set neural network for nested NER.
We use a non-autoregressive decoder to predict the final set of entities in one pass.
Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora.
arXiv Detail & Related papers (2021-05-19T03:10:04Z) - Few-NERD: A Few-Shot Named Entity Recognition Dataset [35.669024917327825]
We present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types.
Few-NERD consists of 188,238 sentences from Wikipedia, 4,601,160 words are included and each is annotated as context or a part of a two-level entity type.
arXiv Detail & Related papers (2021-05-16T15:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.