Simple Questions Generate Named Entity Recognition Datasets
- URL: http://arxiv.org/abs/2112.08808v1
- Date: Thu, 16 Dec 2021 11:44:38 GMT
- Title: Simple Questions Generate Named Entity Recognition Datasets
- Authors: Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang
- Abstract summary: This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions.
Our models largely outperform previous weakly supervised models on six NER benchmarks across four different domains.
Formulating the needs of NER with natural language also allows us to build NER models for fine-grained entity types such as Award.
- Score: 18.743889213075274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition (NER) is a task of extracting named entities of
specific types from text. Current NER models often rely on human-annotated
datasets requiring the vast engagement of professional knowledge on the target
domain and entities. This work introduces an ask-to-generate approach, which
automatically generates NER datasets by asking simple natural language
questions that reflect the needs for entity types (e.g., Which disease?) to an
open-domain question answering system. Without using any in-domain resources
(i.e., training sentences, labels, or in-domain dictionaries), our models
solely trained on our generated datasets largely outperform previous weakly
supervised models on six NER benchmarks across four different domains.
Surprisingly, on NCBI-disease, our model achieves 75.5 F1 score and even
outperforms the previous best weakly supervised model by 4.1 F1 score, which
utilizes a rich in-domain dictionary provided by domain experts. Formulating
the needs of NER with natural language also allows us to build NER models for
fine-grained entity types such as Award, where our model even outperforms fully
supervised models. On three few-shot NER benchmarks, our model achieves new
state-of-the-art performance.
Related papers
- Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model [0.0]
Few-Shot Cross-Domain NER is a process of leveraging knowledge from data-rich source domains to perform entity recognition on data scarce target domains.
We propose IF-WRANER, a retrieval augmented large language model for Named Entity Recognition.
arXiv Detail & Related papers (2024-11-01T08:57:29Z) - ToNER: Type-oriented Named Entity Recognition with Generative Language Model [14.11486479935094]
We propose a novel NER framework, namely ToNER based on a generative model.
In ToNER, a type matching model is proposed at first to identify the entity types most likely to appear in the sentence.
We append a multiple binary classification task to fine-tune the generative model's encoder, so as to generate the refined representation of the input sentence.
arXiv Detail & Related papers (2024-04-14T05:13:37Z) - NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z) - PromptNER: A Prompting Method for Few-shot Named Entity Recognition via
k Nearest Neighbor Search [56.81939214465558]
We propose PromptNER: a novel prompting method for few-shot NER via k nearest neighbor search.
We use prompts that contains entity category information to construct label prototypes, which enables our model to fine-tune with only the support set.
Our approach achieves excellent transfer learning ability, and extensive experiments on the Few-NERD and CrossNER datasets demonstrate that our model achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-05-20T15:47:59Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Domain-Transferable Method for Named Entity Recognition Task [0.6040938686276304]
This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities.
We assume that the supervision can be obtained with no human effort, and neural models can learn from each other.
arXiv Detail & Related papers (2020-11-24T15:45:52Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z) - One Model to Recognize Them All: Marginal Distillation from NER Models
with Different Tag Sets [30.445201832698192]
Named entity recognition (NER) is a fundamental component in the modern language understanding pipeline.
This paper presents a marginal distillation (MARDI) approach for training a unified NER model from resources with disjoint or heterogeneous tag sets.
arXiv Detail & Related papers (2020-04-10T17:36:27Z) - Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains.
We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.