ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity
Linking
- URL: http://arxiv.org/abs/2207.04108v1
- Date: Fri, 8 Jul 2022 19:20:42 GMT
- Title: ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity
Linking
- Authors: Tom Ayoola, Shubhi Tyagi, Joseph Fisher, Christos Christodoulopoulos,
Andrea Pierleoni
- Abstract summary: ReFinED is an efficient end-to-end entity linking model.
It performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass.
It surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1.
- Score: 5.382800665115746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce ReFinED, an efficient end-to-end entity linking model which uses
fine-grained entity types and entity descriptions to perform linking. The model
performs mention detection, fine-grained entity typing, and entity
disambiguation for all mentions within a document in a single forward pass,
making it more than 60 times faster than competitive existing approaches.
ReFinED also surpasses state-of-the-art performance on standard entity linking
datasets by an average of 3.7 F1. The model is capable of generalising to
large-scale knowledge bases such as Wikidata (which has 15 times more entities
than Wikipedia) and of zero-shot entity linking. The combination of speed,
accuracy and scale makes ReFinED an effective and cost-efficient system for
extracting entities from web-scale datasets, for which the model has been
successfully deployed. Our code and pre-trained models are available at
https://github.com/alexa/ReFinED
Related papers
- OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting [49.655711022673046]
OneNet is an innovative framework that utilizes the few-shot learning capabilities of Large Language Models (LLMs) without the need for fine-tuning.
OneNet is structured around three key components prompted by LLMs: (1) an entity reduction processor that simplifies inputs by summarizing and filtering out irrelevant entities, (2) a dual-perspective entity linker that combines contextual cues and prior knowledge for precise entity linking, and (3) an entity consensus judger that employs a unique consistency algorithm to alleviate the hallucination in the entity linking reasoning.
arXiv Detail & Related papers (2024-10-10T02:45:23Z) - Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - SpEL: Structured Prediction for Entity Linking [5.112679200269861]
We revisit the use of structured prediction for entity linking which classifies each individual input token as an entity, and aggregates the token predictions.
Our system, called SpEL, is a state-of-the-art entity linking system that uses some new ideas to apply structured prediction to the task of entity linking.
Our experiments show that we can outperform the state-of-the-art on the commonly used AIDA benchmark dataset for entity linking to Wikipedia.
arXiv Detail & Related papers (2023-10-23T08:24:35Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Entity Linking and Discovery via Arborescence-based Supervised
Clustering [35.93568319872986]
We present novel training and inference procedures that fully utilize mention-to-mention affinities.
We show that this method gracefully extends to entity discovery.
We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset.
arXiv Detail & Related papers (2021-09-02T23:05:58Z) - DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity
Descriptions [41.80938919728834]
We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description.
DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average.
The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities.
arXiv Detail & Related papers (2021-06-09T20:10:48Z) - Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making [22.755892575582788]
Entity Matching aims at recognizing entity records that denote the same real-world object.
We propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction.
Our method is highly efficient and outperforms SOTA EM models in most cases.
arXiv Detail & Related papers (2021-06-08T08:27:31Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.