Named Entity Recognition for Partially Annotated Datasets
- URL: http://arxiv.org/abs/2204.09081v1
- Date: Tue, 19 Apr 2022 18:17:09 GMT
- Title: Named Entity Recognition for Partially Annotated Datasets
- Authors: Michael Strobl, Amine Trabelsi and Osmar Zaiane
- Abstract summary: We are comparing three training strategies for partially annotated datasets and an approach to derive new datasets for new classes of entities from Wikipedia.
In order to properly verify our data acquisition and training approaches are plausible, we manually annotated test datasets for two new classes, namely food and drugs.
- Score: 1.3750624267664153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The most common Named Entity Recognizers are usually sequence taggers trained
on fully annotated corpora, i.e. the class of all words for all entities is
known. Partially annotated corpora, i.e. some but not all entities of some
types are annotated, are too noisy for training sequence taggers since the same
entity may be annotated one time with its true type but not another time,
misleading the tagger. Therefore, we are comparing three training strategies
for partially annotated datasets and an approach to derive new datasets for new
classes of entities from Wikipedia without time-consuming manual data
annotation. In order to properly verify that our data acquisition and training
approaches are plausible, we manually annotated test datasets for two new
classes, namely food and drugs.
Related papers
- Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web [118.67589717634281]
Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
arXiv Detail & Related papers (2023-11-19T10:43:43Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z) - Joint Embedding in Named Entity Linking on Sentence Level [30.229263131244906]
We propose a new unified embedding method by maximizing the relationships learned from knowledge graphs.
We focus on how to link entity for mentions at a sentence level, which reduces the noises introduced by different appearances of the same mention in a document.
arXiv Detail & Related papers (2020-02-12T12:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.