Why Should This Article Be Deleted? Transparent Stance Detection in
Multilingual Wikipedia Editor Discussions
- URL: http://arxiv.org/abs/2310.05779v2
- Date: Mon, 23 Oct 2023 13:18:50 GMT
- Title: Why Should This Article Be Deleted? Transparent Stance Detection in
Multilingual Wikipedia Editor Discussions
- Authors: Lucie-Aim\'ee Kaffee, Arnav Arora and Isabelle Augenstein
- Abstract summary: We construct a novel dataset of Wikipedia editor discussions along with their reasoning in three languages.
The dataset contains the stances of the editors (keep, delete, merge, comment), along with the stated reason, and a content moderation policy, for each edit decision.
We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.
- Score: 47.944081120226905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The moderation of content on online platforms is usually non-transparent. On
Wikipedia, however, this discussion is carried out publicly and the editors are
encouraged to use the content moderation policies as explanations for making
moderation decisions. Currently, only a few comments explicitly mention those
policies -- 20% of the English ones, but as few as 2% of the German and Turkish
comments. To aid in this process of understanding how content is moderated, we
construct a novel multilingual dataset of Wikipedia editor discussions along
with their reasoning in three languages. The dataset contains the stances of
the editors (keep, delete, merge, comment), along with the stated reason, and a
content moderation policy, for each edit decision. We demonstrate that stance
and corresponding reason (policy) can be predicted jointly with a high degree
of accuracy, adding transparency to the decision-making process. We release
both our joint prediction models and the multilingual content moderation
dataset for further research on automated transparent content moderation.
Related papers
- WiDe-analysis: Enabling One-click Content Moderation Analysis on Wikipedia's Articles for Deletion [10.756673240445709]
We introduce a suite of experiments on Wikipedia deletion discussions and wide-analyis (Wikipedia Deletion Analysis), a Python package aimed at providing one click analysis to content moderation discussions.
We release all assets associated with wide-analysis, including data, models and the Python package, and a HuggingFace space with the goal to accelerate research on automating content moderation in Wikipedia and beyond.
arXiv Detail & Related papers (2024-08-10T23:43:11Z) - Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place.
Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z) - Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster [72.84926097773578]
We investigate the effect of explanations on the speed of real-world moderators.
Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
arXiv Detail & Related papers (2024-06-06T14:23:10Z) - Content Moderation on Social Media in the EU: Insights From the DSA
Transparency Database [0.0]
Digital Services Act (DSA) requires large social media platforms in the EU to provide clear and specific information whenever they restrict access to certain content.
Statements of Reasons (SoRs) are collected in the DSA Transparency Database to ensure transparency and scrutiny of content moderation decisions.
We empirically analyze 156 million SoRs within an observation period of two months to provide an early look at content moderation decisions of social media platforms in the EU.
arXiv Detail & Related papers (2023-12-07T16:56:19Z) - Bridging Background Knowledge Gaps in Translation with Automatic
Explicitation [13.862753200823242]
Professional translators incorporate explicitations to explain the missing context.
This work introduces techniques for automatically generating explicitations, motivated by WikiExpl.
The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.
arXiv Detail & Related papers (2023-12-03T07:24:12Z) - Multilingual Content Moderation: A Case Study on Reddit [23.949429463013796]
We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 million Reddit comments.
We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems.
Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.
arXiv Detail & Related papers (2023-02-19T16:36:33Z) - CoRAL: a Context-aware Croatian Abusive Language Dataset [7.536701073553703]
We propose a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context.
We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.
arXiv Detail & Related papers (2022-11-11T08:10:13Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.