Scalable Recommendation of Wikipedia Articles to Editors Using
Representation Learning
- URL: http://arxiv.org/abs/2009.11771v1
- Date: Thu, 24 Sep 2020 15:56:02 GMT
- Title: Scalable Recommendation of Wikipedia Articles to Editors Using
Representation Learning
- Authors: Oleksii Moskalenko, Denis Parra, and Diego Saez-Trumper
- Abstract summary: We develop a scalable system on top of Graph Convolutional Networks and Doc2Vec, learning how to represent Wikipedia articles and deliver personalized recommendations for editors.
We test our model on editors' histories, predicting their most recent edits based on their prior edits.
All of the data used on this paper is publicly available, including graph embeddings for Wikipedia articles, and we release our code to support replication of our experiments.
- Score: 1.8810916321241067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wikipedia is edited by volunteer editors around the world. Considering the
large amount of existing content (e.g. over 5M articles in English Wikipedia),
deciding what to edit next can be difficult, both for experienced users that
usually have a huge backlog of articles to prioritize, as well as for newcomers
who that might need guidance in selecting the next article to contribute.
Therefore, helping editors to find relevant articles should improve their
performance and help in the retention of new editors. In this paper, we address
the problem of recommending relevant articles to editors. To do this, we
develop a scalable system on top of Graph Convolutional Networks and Doc2Vec,
learning how to represent Wikipedia articles and deliver personalized
recommendations for editors. We test our model on editors' histories,
predicting their most recent edits based on their prior edits. We outperform
competitive implicit-feedback collaborative-filtering methods such as WMRF
based on ALS, as well as a traditional IR-method such as content-based
filtering based on BM25. All of the data used on this paper is publicly
available, including graph embeddings for Wikipedia articles, and we release
our code to support replication of our experiments. Moreover, we contribute
with a scalable implementation of a state-of-art graph embedding algorithm as
current ones cannot efficiently handle the sheer size of the Wikipedia graph.
Related papers
- HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits [92.62157408704594]
HelloFresh is based on continuous streams of real-world data generated by intrinsically motivated human labelers.
It covers recent events from X (formerly Twitter) community notes and edits of Wikipedia pages.
It mitigates the risk of test data contamination and benchmark overfitting.
arXiv Detail & Related papers (2024-06-05T16:25:57Z) - Edisum: Summarizing and Explaining Wikipedia Edits at Scale [9.968020416365757]
We propose a model for recommending edit summaries generated by a language model trained to produce good edit summaries.
Our model performs on par with human editors.
More broadly, we showcase how language modeling technology can be used to support humans in maintaining one of the largest and most visible projects on the Web.
arXiv Detail & Related papers (2024-04-04T13:15:28Z) - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models [11.597314728459573]
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.
We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking.
arXiv Detail & Related papers (2024-02-22T01:20:17Z) - WikiIns: A High-Quality Dataset for Controlled Text Editing by Natural
Language Instruction [56.196512595940334]
We build and release WikiIns, a high-quality controlled text editing dataset with improved informativeness.
With the high-quality annotated dataset, we propose automatic approaches to generate a large-scale silver'' training set.
arXiv Detail & Related papers (2023-10-08T04:46:39Z) - SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages [87.08880616654258]
We introduce the SWiPE dataset, which reconstructs the document-level editing process from English Wikipedia (EW) articles to paired Simple Wikipedia (SEW) articles.
We work with Wikipedia editors to annotate 5,000 EW-SEW document pairs, labeling more than 40,000 edits with proposed 19 categories.
We find that SWiPE-trained models generate more complex edits while reducing unwanted edits.
arXiv Detail & Related papers (2023-05-30T16:52:42Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - Leveraging Wikidata's edit history in knowledge graph refinement tasks [77.34726150561087]
edit history represents the process in which the community reaches some kind of fuzzy and distributed consensus.
We build a dataset containing the edit history of every instance from the 100 most important classes in Wikidata.
We propose and evaluate two new methods to leverage this edit history information in knowledge graph embedding models for type prediction tasks.
arXiv Detail & Related papers (2022-10-27T14:32:45Z) - Wiki-Reliability: A Large Scale Dataset for Content Reliability on
Wikipedia [4.148821165759295]
We build the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues.
To build this dataset, we rely on Wikipedia "templates"
We select the 10 most popular reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions as positive or negative.
arXiv Detail & Related papers (2021-05-10T05:07:03Z) - References in Wikipedia: The Editors' Perspective [2.0609354896832492]
We explore the creation and collection of references for new Wikipedia articles from an editors' perspective.
We map out the workflow of editors when creating a new article, emphasising how they select references.
arXiv Detail & Related papers (2021-02-24T19:04:17Z) - Learning Structural Edits via Incremental Tree Transformations [102.64394890816178]
We present a generic model for incremental editing of structured data (i.e., "structural edits")
Our editor learns to iteratively generate tree edits (e.g., deleting or adding a subtree) and applies them to the partially edited data.
We evaluate our proposed editor on two source code edit datasets, where results show that, with the proposed edit encoder, our editor significantly improves accuracy over previous approaches.
arXiv Detail & Related papers (2021-01-28T16:11:32Z) - Layered Graph Embedding for Entity Recommendation using Wikipedia in the
Yahoo! Knowledge Graph [4.36080995655245]
We describe an embedding-based entity recommendation framework for Wikipedia.
We show that the resulting embeddings and recommendations perform well in terms of quality and user engagement.
arXiv Detail & Related papers (2020-04-15T00:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.