Related papers: Automatic Quality Assessment of Wikipedia Articles -- A Systematic Literature Review

Related papers

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation [29.386002986862568]
We introduce WikiAutoGen, a novel system for automated multimodal Wikipedia-style article generation. Unlike prior approaches, WikiAutoGen retrieves and integrates relevant images alongside text, enriching both the depth and visual appeal of generated content. To further improve factual accuracy and comprehensiveness, we propose a multi-perspective self-reflection mechanism.
arXiv Detail & Related papers (2025-03-24T18:51:55Z)
REVERSUM: A Multi-staged Retrieval-Augmented Generation Method to Enhance Wikipedia Tail Biographies through Personal Narratives [4.427603894929721]
This study proposes a novel approach to enhancing Wikipedia's B and C category biography articles. By utilizing a multi-staged retrieval-augmented generation technique, we aim to enrich the informational content of lesser-known articles.
arXiv Detail & Related papers (2025-02-17T18:53:42Z)
How Good is Your Wikipedia? [13.814955569390207]
This paper critically examines the data quality of Wikipedia in a non-English setting by subjecting it to various quality filtering techniques. We find that data quality pruning is an effective means for resource-efficient training without hurting performance.
arXiv Detail & Related papers (2024-11-08T12:35:58Z)
A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia [17.051622145253855]
We develop machine learning models to predict the sustainable success of Wikipedia articles. We find that the longer an article takes to be recognized as high-quality, the more likely it is to maintain that status over time. Our analysis provides insights into broader collective actions beyond Wikipedia.
arXiv Detail & Related papers (2024-10-24T20:42:53Z)
Language-Agnostic Modeling of Wikipedia Articles for Content Quality Assessment across Languages [0.19698344608599344]
We propose a novel computational framework for modeling the quality of Wikipedia articles. Our framework is based on language-agnostic structural features extracted from the articles. We have built datasets with the feature values and quality scores of all revisions of all articles in the existing language versions of Wikipedia.
arXiv Detail & Related papers (2024-04-15T13:07:31Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews. The newly emerging AI-generated literature reviews are also appraised. This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia [14.325320851640084]
We propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of English Wikipedia. WikiSQE has about 3.4 M sentences with 153 quality labels.
arXiv Detail & Related papers (2023-05-10T06:45:13Z)
The Glass Ceiling of Automatic Evaluation in Natural Language Generation [60.59732704936083]
We take a step back and analyze recent progress by comparing the body of existing automatic metrics and human metrics. Our extensive statistical analysis reveals surprising findings: automatic metrics -- old and new -- are much more similar to each other than to humans.
arXiv Detail & Related papers (2022-08-31T01:13:46Z)
Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims. Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z)
Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Measuring Wikipedia Article Quality in One Dimension by Extending ORES with Ordinal Regression [1.52292571922932]
Article quality ratings on English language Wikipedia have been widely used by both Wikipedia community members and academic researchers. measuring quality presents many methodological challenges. The most widely used systems use labels on discrete ordinal scales when assessing quality, but such labels can be inconvenient for statistics and machine learning.
arXiv Detail & Related papers (2021-08-15T23:05:28Z)
Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.