Improving Wikipedia Verifiability with AI
- URL: http://arxiv.org/abs/2207.06220v1
- Date: Fri, 8 Jul 2022 15:23:29 GMT
- Title: Improving Wikipedia Verifiability with AI
- Authors: Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis,
Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick,
Pierre-Emmanuel Mazar\'e, Armand Joulin, Edouard Grave, Sebastian Riedel
- Abstract summary: We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
- Score: 116.69749668874493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Verifiability is a core content policy of Wikipedia: claims that are likely
to be challenged need to be backed by citations. There are millions of articles
available online and thousands of new articles are released each month. For
this reason, finding relevant sources is a difficult task: many claims do not
have any references that support them. Furthermore, even existing citations
might not support a given claim or become obsolete once the original source is
updated or deleted. Hence, maintaining and improving the quality of Wikipedia
references is an important challenge and there is a pressing need for better
tools to assist humans in this effort. Here, we show that the process of
improving references can be tackled with the help of artificial intelligence
(AI). We develop a neural network based system, called Side, to identify
Wikipedia citations that are unlikely to support their claims, and subsequently
recommend better ones from the web. We train this model on existing Wikipedia
references, therefore learning from the contributions and combined wisdom of
thousands of Wikipedia editors. Using crowd-sourcing, we observe that for the
top 10% most likely citations to be tagged as unverifiable by our system,
humans prefer our system's suggested alternatives compared to the originally
cited reference 70% of the time. To validate the applicability of our system,
we built a demo to engage with the English-speaking Wikipedia community and
find that Side's first citation recommendation collects over 60% more
preferences than existing Wikipedia citations for the same top 10% most likely
unverifiable claims according to Side. Our results indicate that an AI-based
system could be used, in tandem with humans, to improve the verifiability of
Wikipedia. More generally, we hope that our work can be used to assist fact
checking efforts and increase the general trustworthiness of information
online.
Related papers
- Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Longitudinal Assessment of Reference Quality on Wikipedia [7.823541290904653]
This work analyzes the reliability of this global encyclopedia through the lens of its references.
We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references.
arXiv Detail & Related papers (2023-03-09T13:04:14Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Wiki-Reliability: A Large Scale Dataset for Content Reliability on
Wikipedia [4.148821165759295]
We build the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues.
To build this dataset, we rely on Wikipedia "templates"
We select the 10 most popular reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions as positive or negative.
arXiv Detail & Related papers (2021-05-10T05:07:03Z) - WhatTheWikiFact: Fact-Checking Claims Against Wikipedia [17.36054090232896]
We present WhatTheWikiFact, a system for automatic claim verification using Wikipedia.
The system predicts the veracity of an input claim, and it further shows the evidence it has retrieved as part of the verification process.
arXiv Detail & Related papers (2021-04-16T12:23:56Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z) - Quantifying Engagement with Citations on Wikipedia [13.703047949952852]
One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
arXiv Detail & Related papers (2020-01-23T15:52:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.