Related papers: Linking Named Entities in Diderot's \textit{Encyclopédie} to Wikidata

Linking Named Entities in Diderot's \textit{Encyclopédie} to Wikidata

URL: http://arxiv.org/abs/2406.03221v1
Date: Wed, 5 Jun 2024 13:00:04 GMT
Title: Linking Named Entities in Diderot's \textit{Encyclopédie} to Wikidata
Authors: Pierre Nugues,
Abstract summary: Diderot's textitEncyclop'edie is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. The lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. We describe the annotation of more than 10,300 of the textitEncyclop'edie entries with Wikidata identifiers enabling us to connect these entries to the graph.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diderot's \textit{Encyclop\'edie} is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. \textit{Wikipedia} has the same ambition with a much greater scope. However, the lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. A key element of \textit{Wikipedia} is Wikidata that backs the articles with a graph of structured data. In this paper, we describe the annotation of more than 10,300 of the \textit{Encyclop\'edie} entries with Wikidata identifiers enabling us to connect these entries to the graph. We considered geographic and human entities. The \textit{Encyclop\'edie} does not contain biographic entries as they mostly appear as subentries of locations. We extracted all the geographic entries and we completely annotated all the entries containing a description of human entities. This represents more than 2,600 links referring to locations or human entities. In addition, we annotated more than 9,500 entries having a geographic content only. We describe the annotation process as well as application examples. This resource is available at https://github.com/pnugues/encyclopedie_1751

Related papers

Towards a Brazilian History Knowledge Graph [50.26735825937335]
We construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata. We show that many terms/entities described in the DHBB do not have corresponding concepts (or Q items) in Wikidata.
arXiv Detail & Related papers (2024-03-28T22:05:32Z)
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z)
WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles. The dataset consists of over 80k English samples on 6987 topics. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z)
Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata [0.0]
The textitPetit Larousse illustr'e is a French dictionary first published in 1905. We describe a new lexical resource, where we connected all the dictionary entries of the history and geography part to current data sources. Using the wikidata links, we can automate more easily the identification, comparison, and verification of historically-situated representations.
arXiv Detail & Related papers (2022-06-22T12:45:21Z)
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z)
Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z)
Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Commonsense Knowledge in Wikidata [3.8359194344969807]
This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources. We map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph.
arXiv Detail & Related papers (2020-08-18T18:23:06Z)
Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages. We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z)
From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information [59.0235296929395]
We investigate how language encodes and networks geographic information on the aboutness level of texts. Our study shows a Zipfian organization of the thematic universe in which geographical places are located in online communication. Places, whether close to each other or not, are located in neighboring places that span similarworks in the topic universe.
arXiv Detail & Related papers (2020-02-04T18:31:25Z)
Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute [0.5530212768657543]
We address the task of integrating graph (i.e. structure) information to classify Wikipedia into a fine-grained named entity ontology (NE) We conduct at-scale practical experiments, on a manually labeled subset of 22,000 pages extracted from the Japanese Wikipedia. Our results show that integrating graph information succeeds at reducing sparsity of the input feature space, and yields classification results that are comparable or better than previous works.
arXiv Detail & Related papers (2020-01-21T14:19:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.