Taxonomy Expansion for Named Entity Recognition
- URL: http://arxiv.org/abs/2305.13191v1
- Date: Mon, 22 May 2023 16:23:46 GMT
- Title: Taxonomy Expansion for Named Entity Recognition
- Authors: Karthikeyan K, Yogarshi Vyas, Jie Ma, Giovanni Paolini, Neha Anna
John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, Miguel
Ballesteros
- Abstract summary: Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types.
A simple approach is to re-annotate entire dataset with both existing and additional entity types.
We propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets.
- Score: 65.49344005894996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a Named Entity Recognition (NER) model often involves fixing a
taxonomy of entity types. However, requirements evolve and we might need the
NER model to recognize additional entity types. A simple approach is to
re-annotate entire dataset with both existing and additional entity types and
then train the model on the re-annotated dataset. However, this is an extremely
laborious task. To remedy this, we propose a novel approach called Partial
Label Model (PLM) that uses only partially annotated datasets. We experiment
with 6 diverse datasets and show that PLM consistently performs better than
most other approaches (0.5 - 2.5 F1), including in novel settings for taxonomy
expansion not considered in prior work. The gap between PLM and all other
approaches is especially large in settings where there is limited data
available for the additional entity types (as much as 11 F1), thus suggesting a
more cost effective approaches to taxonomy expansion.
Related papers
- Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition [40.23783832224238]
We present B2NERD, a cohesive and efficient dataset for Open NER.
We detect inconsistent entity definitions across datasets and clarify them by distinguishable label names to construct a universal taxonomy of 400+ entity types.
Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and surpass previous methods in 3 out-of-domain benchmarks across 15 datasets and 6 languages.
arXiv Detail & Related papers (2024-06-17T03:57:35Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation [48.75470418596875]
Training on large-scale datasets can boost the performance of video instance segmentation while the datasets for VIS are hard to scale up due to the high labor cost.
What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity.
We conduct extensive evaluations on four popular and challenging benchmarks, including YouTube-VIS 2019, YouTube-VIS 2021, OVIS, and UVO.
Our model shows significant improvement over the baseline solutions, and sets new state-of-the-art records on all benchmarks.
arXiv Detail & Related papers (2023-12-11T18:50:09Z) - From Ultra-Fine to Fine: Fine-tuning Ultra-Fine Entity Typing Models to
Fine-grained [12.948753628039093]
A common way to address this problem is to use distantly annotated training data that contains incorrect labels.
We propose a new approach that can avoid the need of creating distantly labeled data whenever there is a new type schema.
arXiv Detail & Related papers (2023-12-11T08:12:01Z) - MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference,
Temporal, Causal, and Subevent Relation Extraction [78.61546292830081]
We construct a large-scale human-annotated ERE dataset MAVEN-ERE with improved annotation schemes.
It contains 103,193 event coreference chains, 1,216,217 temporal relations, 57,992 causal relations, and 15,841 subevent relations.
Experiments show that ERE on MAVEN-ERE is quite challenging, and considering relation interactions with joint learning can improve performances.
arXiv Detail & Related papers (2022-11-14T13:34:49Z) - Automatic universal taxonomies for multi-domain semantic segmentation [1.4364491422470593]
Training semantic segmentation models on multiple datasets has sparked a lot of recent interest in the computer vision community.
established datasets have mutually incompatible labels which disrupt principled inference in the wild.
We address this issue by automatic construction of universal through iterative dataset integration.
arXiv Detail & Related papers (2022-07-18T08:53:17Z) - ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity
Linking [5.382800665115746]
ReFinED is an efficient end-to-end entity linking model.
It performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass.
It surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1.
arXiv Detail & Related papers (2022-07-08T19:20:42Z) - Exploring and Evaluating Attributes, Values, and Structures for Entity
Alignment [100.19568734815732]
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs.
attribute triples can also provide crucial alignment signal but have not been well explored yet.
We propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently.
arXiv Detail & Related papers (2020-10-07T08:03:58Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.