WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
- URL: http://arxiv.org/abs/2109.07449v1
- Date: Wed, 15 Sep 2021 17:35:24 GMT
- Title: WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
- Authors: Jessica Lin, Amir Zeldes
- Abstract summary: We present and evaluate WikiGUM, a fully wikified dataset covering all mentions of named entities.
The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date.
- Score: 6.619650459583443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous work on Entity Linking has focused on resources targeting non-nested
proper named entity mentions, often in data from Wikipedia, i.e. Wikification.
In this paper, we present and evaluate WikiGUM, a fully wikified dataset,
covering all mentions of named entities, including their non-named and
pronominal mentions, as well as mentions nested within other mentions. The
dataset covers a broad range of 12 written and spoken genres, most of which
have not been included in Entity Linking efforts to date, leading to poor
performance by a pretrained SOTA system in our evaluation. The availability of
a variety of other annotations for the same data also enables further research
on entities in context.
Related papers
- Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Taxonomy Expansion for Named Entity Recognition [65.49344005894996]
Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types.
A simple approach is to re-annotate entire dataset with both existing and additional entity types.
We propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets.
arXiv Detail & Related papers (2023-05-22T16:23:46Z) - TempEL: Linking Dynamically Evolving and Newly Emerging Entities [50.980331847622026]
In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear.
We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task.
We introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022.
arXiv Detail & Related papers (2023-02-05T22:34:36Z) - Building and Evaluating Universal Named-Entity Recognition English
corpus [0.0]
This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora.
By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated.
arXiv Detail & Related papers (2022-12-14T11:32:24Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery
and Indexing [28.62173704769311]
Existing work on entity linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked.
This paper created the Unknown Entity Discovery and Indexing benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system.
Building on dense-retrieval based entity linking, we introduce the end-to-end EDIN pipeline that detects, clusters, and indexes mentions of unknown entities in context.
arXiv Detail & Related papers (2022-05-25T08:29:39Z) - Named Entity Recognition for Partially Annotated Datasets [1.3750624267664153]
We are comparing three training strategies for partially annotated datasets and an approach to derive new datasets for new classes of entities from Wikipedia.
In order to properly verify our data acquisition and training approaches are plausible, we manually annotated test datasets for two new classes, namely food and drugs.
arXiv Detail & Related papers (2022-04-19T18:17:09Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain [76.21775236904185]
dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities.
A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types.
To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE.
arXiv Detail & Related papers (2021-08-16T08:21:50Z) - Joint Embedding in Named Entity Linking on Sentence Level [30.229263131244906]
We propose a new unified embedding method by maximizing the relationships learned from knowledge graphs.
We focus on how to link entity for mentions at a sentence level, which reduces the noises introduced by different appearances of the same mention in a document.
arXiv Detail & Related papers (2020-02-12T12:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.