Architecture for a multilingual Wikipedia
- URL: http://arxiv.org/abs/2004.04733v1
- Date: Wed, 8 Apr 2020 22:25:10 GMT
- Title: Architecture for a multilingual Wikipedia
- Authors: Denny Vrande\v{c}i\'c
- Abstract summary: We argue that we need a new approach to tackle this problem more effectively.
This paper proposes an architecture for a system that fulfills this goal.
It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wikipedia's vision is a world in which everyone can share in the sum of all
knowledge. In its first two decades, this vision has been very unevenly
achieved. One of the largest hindrances is the sheer number of languages
Wikipedia needs to cover in order to achieve that goal. We argue that we need a
new approach to tackle this problem more effectively, a multilingual Wikipedia
where content can be shared between language editions. This paper proposes an
architecture for a system that fulfills this goal. It separates the goal in two
parts: creating and maintaining content in an abstract notation within a
project called Abstract Wikipedia, and creating an infrastructure called
Wikilambda that can translate this notation to natural language. Both parts are
fully owned and maintained by the community, as is the integration of the
results in the existing Wikipedia editions. This architecture will make more
encyclopedic content available to more people in their own language, and at the
same time allow more people to contribute knowledge and reach more people with
their contributions, no matter what their respective language backgrounds.
Additionally, Wikilambda will unlock a new type of knowledge asset people can
share in through the Wikimedia projects, functions, which will vastly expand
what people can do with knowledge from Wikimedia, and provide a new venue to
collaborate and to engage the creativity of contributors from all around the
world. These two projects will considerably expand the capabilities of the
Wikimedia platform to enable every single human being to freely share in the
sum of all knowledge.
Related papers
- Language-Agnostic Modeling of Wikipedia Articles for Content Quality Assessment across Languages [0.19698344608599344]
We propose a novel computational framework for modeling the quality of Wikipedia articles.
Our framework is based on language-agnostic structural features extracted from the articles.
We have built datasets with the feature values and quality scores of all revisions of all articles in the existing language versions of Wikipedia.
arXiv Detail & Related papers (2024-04-15T13:07:31Z) - Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles.
We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia.
We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z) - Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia [59.47639408597319]
Kuaipedia is a large-scale multi-modal encyclopedia consisting of items, aspects, and short videos lined to them.
It was extracted from billions of videos of Kuaishou, a well-known short-video platform in China.
arXiv Detail & Related papers (2022-10-28T12:54:30Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large
Web Corpus [76.9522248303716]
We propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot.
We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus.
We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art-and-read system to match and even outperform Wikipedia-based models.
arXiv Detail & Related papers (2021-12-18T13:15:34Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Multilingual Entity Linking System for Wikipedia with a
Machine-in-the-Loop Approach [2.2889152373118975]
Despite Wikipedia editors' efforts to add and maintain its content, the distribution of links remains sparse in many language editions.
This paper introduces a machine-in-the-loop entity linking system that can comply with community guidelines for adding a link.
We develop an interactive recommendation interface that proposes candidate links to editors who can confirm, reject, or adapt the recommendation.
arXiv Detail & Related papers (2021-05-31T16:29:42Z) - Crosslingual Topic Modeling with WikiPDA [15.198979978589476]
We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA)
It learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics.
We show its utility in two applications: a study of topical biases in 28 Wikipedia editions, and crosslingual supervised classification.
arXiv Detail & Related papers (2020-09-23T15:19:27Z) - Computational linguistic assessment of textbook and online learning
media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features.
The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia.
The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.