EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery
and Indexing
- URL: http://arxiv.org/abs/2205.12570v1
- Date: Wed, 25 May 2022 08:29:39 GMT
- Title: EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery
and Indexing
- Authors: Nora Kassner, Fabio Petroni, Mikhail Plekhanov, Sebastian Riedel,
Nicola Cancedda
- Abstract summary: Existing work on entity linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked.
This paper created the Unknown Entity Discovery and Indexing benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system.
Building on dense-retrieval based entity linking, we introduce the end-to-end EDIN pipeline that detects, clusters, and indexes mentions of unknown entities in context.
- Score: 28.62173704769311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing work on Entity Linking mostly assumes that the reference knowledge
base is complete, and therefore all mentions can be linked. In practice this is
hardly ever the case, as knowledge bases are incomplete and because novel
concepts arise constantly. This paper created the Unknown Entity Discovery and
Indexing (EDIN) benchmark where unknown entities, that is entities without a
description in the knowledge base and labeled mentions, have to be integrated
into an existing entity linking system. By contrasting EDIN with zero-shot
entity linking, we provide insight on the additional challenges it poses.
Building on dense-retrieval based entity linking, we introduce the end-to-end
EDIN pipeline that detects, clusters, and indexes mentions of unknown entities
in context. Experiments show that indexing a single embedding per entity
unifying the information of multiple mentions works better than indexing
mentions independently.
Related papers
- OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting [49.655711022673046]
OneNet is an innovative framework that utilizes the few-shot learning capabilities of Large Language Models (LLMs) without the need for fine-tuning.
OneNet is structured around three key components prompted by LLMs: (1) an entity reduction processor that simplifies inputs by summarizing and filtering out irrelevant entities, (2) a dual-perspective entity linker that combines contextual cues and prior knowledge for precise entity linking, and (3) an entity consensus judger that employs a unique consistency algorithm to alleviate the hallucination in the entity linking reasoning.
arXiv Detail & Related papers (2024-10-10T02:45:23Z) - Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - SpEL: Structured Prediction for Entity Linking [5.112679200269861]
We revisit the use of structured prediction for entity linking which classifies each individual input token as an entity, and aggregates the token predictions.
Our system, called SpEL, is a state-of-the-art entity linking system that uses some new ideas to apply structured prediction to the task of entity linking.
Our experiments show that we can outperform the state-of-the-art on the commonly used AIDA benchmark dataset for entity linking to Wikipedia.
arXiv Detail & Related papers (2023-10-23T08:24:35Z) - NASTyLinker: NIL-Aware Scalable Transformer-based Entity Linker [2.3605348648054463]
We introduce an EL approach that is aware of NIL-entities and produces corresponding mention clusters while maintaining high linking performance for known entities.
We show the effectiveness and scalability of NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL with respect to NIL-entities.
arXiv Detail & Related papers (2023-03-08T08:08:57Z) - Focusing on Context is NICE: Improving Overshadowed Entity
Disambiguation [43.82625203429496]
NICE uses entity type information to leverage context and avoid over-relying on the frequency-based prior.
Our experiments show that NICE achieves the best performance results on the overshadowed entities while still performing competitively on the frequent entities.
arXiv Detail & Related papers (2022-10-12T13:05:37Z) - Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated.
We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained.
We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - EntQA: Entity Linking as Question Answering [18.39360849304263]
We present EntQA, which stands for Entity linking as Question Answering.
Our approach combines progress in entity linking with that in open-domain question answering.
Unlike in previous works, we do not rely on a mention-candidates dictionary or large-scale weak supervision.
arXiv Detail & Related papers (2021-10-05T21:39:57Z) - Neural Production Systems [90.75211413357577]
Visual environments are structured, consisting of distinct objects or entities.
To partition images into entities, deep-learning researchers have proposed structural inductive biases.
We take inspiration from cognitive science and resurrect a classic approach, which consists of a set of rule templates.
This architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information.
arXiv Detail & Related papers (2021-03-02T18:53:20Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.