Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy
- URL: http://arxiv.org/abs/2511.04926v1
- Date: Fri, 07 Nov 2025 02:09:00 GMT
- Title: Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy
- Authors: Shixiong Zhao, Hideaki Takeda,
- Abstract summary: Wikidata is the largest open knowledge graph on the web, encompassing over 120 million entities.<n>This study proposes and applies a novel validation method to confirm the presence of classification errors and over-generalized subclass links.<n>We develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities.
- Score: 1.4705700441788643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wikidata is currently the largest open knowledge graph on the web, encompassing over 120 million entities. It integrates data from various domain-specific databases and imports a substantial amount of content from Wikipedia, while also allowing users to freely edit its content. This openness has positioned Wikidata as a central resource in knowledge graph research and has enabled convenient knowledge access for users worldwide. However, its relatively loose editorial policy has also led to a degree of taxonomic inconsistency. Building on prior work, this study proposes and applies a novel validation method to confirm the presence of classification errors, over-generalized subclass links, and redundant connections in specific domains of Wikidata. We further introduce a new evaluation criterion for determining whether such issues warrant correction and develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities-leveraging the platform's crowdsourced nature to its full potential.
Related papers
- Leveraging Wikidata's edit history in knowledge graph refinement tasks [77.34726150561087]
edit history represents the process in which the community reaches some kind of fuzzy and distributed consensus.
We build a dataset containing the edit history of every instance from the 100 most important classes in Wikidata.
We propose and evaluate two new methods to leverage this edit history information in knowledge graph embedding models for type prediction tasks.
arXiv Detail & Related papers (2022-10-27T14:32:45Z) - Enriching Wikidata with Linked Open Data [4.311189028205597]
Current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata.
We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation.
Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality.
arXiv Detail & Related papers (2022-07-01T01:50:24Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's
Revision History [5.727994421498849]
We present Wikidated 1.0, a dataset of Wikidata's full revision history.
To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph.
arXiv Detail & Related papers (2021-12-09T15:54:03Z) - Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph.
Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia.
Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Commonsense Knowledge in Wikidata [3.8359194344969807]
This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources.
We map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph.
arXiv Detail & Related papers (2020-08-18T18:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.