Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking
- URL: http://arxiv.org/abs/2202.13404v1
- Date: Sun, 27 Feb 2022 17:38:53 GMT
- Title: Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking
- Authors: Tuan Manh Lai, Heng Ji, ChengXiang Zhai
- Abstract summary: We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
- Score: 76.00737707718795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity linking (EL) is the task of linking entity mentions in a document to
referent entities in a knowledge base (KB). Many previous studies focus on
Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it
is the most extensive crowdsourced KB. The scale of Wikidata can open up many
new real-world applications, but its massive number of entities also makes EL
challenging. To effectively narrow down the search space, we propose a novel
candidate retrieval paradigm based on entity profiling. Wikidata entities and
their textual fields are first indexed into a text search engine (e.g.,
Elasticsearch). During inference, given a mention and its context, we use a
sequence-to-sequence (seq2seq) model to generate the profile of the target
entity, which consists of its title and description. We use the profile to
query the indexed search engine to retrieve candidate entities. Our approach
complements the traditional approach of using a Wikipedia anchor-text
dictionary, enabling us to further design a highly effective hybrid method for
candidate retrieval. Combined with a simple cross-attention reranker, our
complete EL framework achieves state-of-the-art results on three Wikidata-based
datasets and strong performance on TACKBP-2010.
Related papers
- Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs [4.721309965816974]
We propose to make scholarly data more accessible sustainably by leveraging Wikidata's infrastructure.
Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata.
arXiv Detail & Related papers (2024-11-13T15:34:52Z) - Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Enriching Wikidata with Linked Open Data [4.311189028205597]
Current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata.
We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation.
Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality.
arXiv Detail & Related papers (2022-07-01T01:50:24Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph.
Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia.
Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.