Connecting a French Dictionary from the Beginning of the 20th Century to
Wikidata
- URL: http://arxiv.org/abs/2206.11022v1
- Date: Wed, 22 Jun 2022 12:45:21 GMT
- Title: Connecting a French Dictionary from the Beginning of the 20th Century to
Wikidata
- Authors: Pierre Nugues
- Abstract summary: The textitPetit Larousse illustr'e is a French dictionary first published in 1905.
We describe a new lexical resource, where we connected all the dictionary entries of the history and geography part to current data sources.
Using the wikidata links, we can automate more easily the identification, comparison, and verification of historically-situated representations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The \textit{Petit Larousse illustr\'e} is a French dictionary first published
in 1905. Its division in two main parts on language and on history and
geography corresponds to a major milestone in French lexicography as well as a
repository of general knowledge from this period. Although the value of many
entries from 1905 remains intact, some descriptions now have a dimension that
is more historical than contemporary. They are nonetheless significant to
analyze and understand cultural representations from this time. A comparison
with more recent information or a verification of these entries would require a
tedious manual work. In this paper, we describe a new lexical resource, where
we connected all the dictionary entries of the history and geography part to
current data sources. For this, we linked each of these entries to a wikidata
identifier. Using the wikidata links, we can automate more easily the
identification, comparison, and verification of historically-situated
representations. We give a few examples on how to process wikidata identifiers
and we carried out a small analysis of the entities described in the dictionary
to outline possible applications. The resource, i.e. the annotation of 20,245
dictionary entries with wikidata links, is available from GitHub
(\url{https://github.com/pnugues/petit_larousse_1905/})
Related papers
- Linking Named Entities in Diderot's \textit{Encyclopédie} to Wikidata [0.0]
Diderot's textitEncyclop'edie is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era.
The lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved.
We describe the annotation of more than 10,300 of the textitEncyclop'edie entries with Wikidata identifiers enabling us to connect these entries to the graph.
arXiv Detail & Related papers (2024-06-05T13:00:04Z) - Towards a Brazilian History Knowledge Graph [50.26735825937335]
We construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata.
We show that many terms/entities described in the DHBB do not have corresponding concepts (or Q items) in Wikidata.
arXiv Detail & Related papers (2024-03-28T22:05:32Z) - FRACAS: A FRench Annotated Corpus of Attribution relations in newS [0.0]
We present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution.
We first describe the composition of our corpus and the choices that were made in selecting the data.
We then detail our inter-annotator agreement between the 8 annotators who worked on manual labelling.
arXiv Detail & Related papers (2023-09-19T13:19:54Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - Longtonotes: OntoNotes with Longer Coreference Chains [111.73115731999793]
We build a corpus of coreference-annotated documents of significantly longer length than what is currently available.
The resulting corpus, which we call LongtoNotes, contains documents in multiple genres of the English language with varying lengths.
We evaluate state-of-the-art neural coreference systems on this new corpus.
arXiv Detail & Related papers (2022-10-07T15:58:41Z) - Does Wikidata Support Analogical Reasoning? [17.68704739786042]
We investigate whether the knowledge in Wikidata supports analogical reasoning.
We show that Wikidata can be used to create data for analogy classification.
We devise a set of metrics to guide an automatic method for extracting analogies from Wikidata.
arXiv Detail & Related papers (2022-10-02T20:46:52Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.