TempEL: Linking Dynamically Evolving and Newly Emerging Entities
- URL: http://arxiv.org/abs/2302.02500v1
- Date: Sun, 5 Feb 2023 22:34:36 GMT
- Title: TempEL: Linking Dynamically Evolving and Newly Emerging Entities
- Authors: Klim Zaporojets, Lucie-Aimee Kaffee, Johannes Deleu, Thomas Demeester,
Chris Develder, Isabelle Augenstein
- Abstract summary: In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear.
We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task.
We introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022.
- Score: 50.980331847622026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In our continuously evolving world, entities change over time and new,
previously non-existing or unknown, entities appear. We study how this
evolutionary scenario impacts the performance on a well established entity
linking (EL) task. For that study, we introduce TempEL, an entity linking
dataset that consists of time-stratified English Wikipedia snapshots from 2013
to 2022, from which we collect both anchor mentions of entities, and these
target entities' descriptions. By capturing such temporal aspects, our newly
introduced TempEL resource contrasts with currently existing entity linking
datasets, which are composed of fixed mentions linked to a single static
version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA).
Indeed, for each of our collected temporal snapshots, TempEL contains links to
entities that are continual, i.e., occur in all of the years, as well as
completely new entities that appear for the first time at some point. Thus, we
enable to quantify the performance of current state-of-the-art EL models for:
(i) entities that are subject to changes over time in their Knowledge Base
descriptions as well as their mentions' contexts, and (ii) newly created
entities that were previously non-existing (e.g., at the time the EL model was
trained). Our experimental results show that in terms of temporal performance
degradation, (i) continual entities suffer a decrease of up to 3.1% EL
accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This
highlights the challenge of the introduced TempEL dataset and opens new
research prospects in the area of time-evolving entity disambiguation.
Related papers
- Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - Instructed Language Models with Retrievers Are Powerful Entity Linkers [87.16283281290053]
Instructed Generative Entity Linker (INSGENEL) is the first approach that enables casual language models to perform entity linking over knowledge bases.
INSGENEL outperforms previous generative alternatives with +6.8 F1 points gain on average.
arXiv Detail & Related papers (2023-11-06T16:38:51Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - NASTyLinker: NIL-Aware Scalable Transformer-based Entity Linker [2.3605348648054463]
We introduce an EL approach that is aware of NIL-entities and produces corresponding mention clusters while maintaining high linking performance for known entities.
We show the effectiveness and scalability of NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL with respect to NIL-entities.
arXiv Detail & Related papers (2023-03-08T08:08:57Z) - Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated.
We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained.
We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z) - Entity Linking and Discovery via Arborescence-based Supervised
Clustering [35.93568319872986]
We present novel training and inference procedures that fully utilize mention-to-mention affinities.
We show that this method gracefully extends to entity discovery.
We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset.
arXiv Detail & Related papers (2021-09-02T23:05:58Z) - Robustness Evaluation of Entity Disambiguation Using Prior Probes:the
Case of Entity Overshadowing [11.513083693564466]
We evaluate and report the performance of popular entity linking systems on the ShadowLink benchmark.
Results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation.
arXiv Detail & Related papers (2021-08-24T20:54:56Z) - DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity
Descriptions [41.80938919728834]
We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description.
DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average.
The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities.
arXiv Detail & Related papers (2021-06-09T20:10:48Z) - HittER: Hierarchical Transformers for Knowledge Graph Embeddings [85.93509934018499]
We propose Hitt to learn representations of entities and relations in a complex knowledge graph.
Experimental results show that Hitt achieves new state-of-the-art results on multiple link prediction.
We additionally propose a simple approach to integrate Hitt into BERT and demonstrate its effectiveness on two Freebase factoid answering datasets.
arXiv Detail & Related papers (2020-08-28T18:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.