Related papers: Community Notes are Vulnerable to Rater Bias and Manipulation

Community Notes are Vulnerable to Rater Bias and Manipulation

URL: http://arxiv.org/abs/2511.02615v1
Date: Tue, 04 Nov 2025 14:39:34 GMT
Title: Community Notes are Vulnerable to Rater Bias and Manipulation
Authors: Bao Tran Truong, Siqi Wu, Alessandro Flammini, Filippo Menczer, Alexander J. Stewart,
Abstract summary: We evaluate the Community Notes algorithm using simulated data that models realistic rater and note behaviors.<n>We find that the algorithm suppresses a substantial fraction of genuinely helpful notes and is highly sensitive to rater biases.
Score: 75.34858521118305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social media platforms increasingly rely on crowdsourced moderation systems like Community Notes to combat misinformation at scale. However, these systems face challenges from rater bias and potential manipulation, which may undermine their effectiveness. Here we systematically evaluate the Community Notes algorithm using simulated data that models realistic rater and note behaviors, quantifying error rates in publishing helpful versus unhelpful notes. We find that the algorithm suppresses a substantial fraction of genuinely helpful notes and is highly sensitive to rater biases, including polarization and in-group preferences. Moreover, a small minority (5--20\%) of bad raters can strategically suppress targeted helpful notes, effectively censoring reliable information. These findings suggest that while community-driven moderation may offer scalability, its vulnerability to bias and manipulation raises concerns about reliability and trustworthiness, highlighting the need for improved mechanisms to safeguard the integrity of crowdsourced fact-checking.

Related papers

Towards Reliable Negative Sampling for Recommendation with Implicit Feedback via In-Community Popularity [8.257297407777555]
We propose textbfICPNS (In-Community Popularity Negative Sampling) to identify reliable and informative negative samples.<n>Our approach is grounded in the insight that item exposure is driven by latent user communities.<n>ICPNS yields consistent improvements on graph-based recommenders and competitive performance on MF-based models.
arXiv Detail & Related papers (2026-02-21T08:53:10Z)
Hyperactive Minority Alter the Stability of Community Notes [39.13508775153173]
We study the emergence and visibility of Community Notes on X.<n>We show that contribution activity is highly concentrated.<n>We replicate the notes' emergence process by integrating the open-source implementation of the Community Notes consensus algorithm.
arXiv Detail & Related papers (2026-02-09T18:04:54Z)
Conf-GNNRec: Quantifying and Calibrating the Prediction Confidence for GNN-based Recommendation Methods [16.528524630468773]
We propose a new method to quantify and calibrate the prediction confidence of GNN-based recommendations (Conf-GNNRec)<n>Specifically, we propose a rating calibration method that dynamically adjusts excessive ratings to mitigate overconfidence based on user personalization.<n>We also design a confidence loss function to reduce the overconfidence of negative samples and effectively improve recommendation performance.
arXiv Detail & Related papers (2025-05-22T09:48:17Z)
Uncertainty in Repeated Implicit Feedback as a Measure of Reliability [12.441205946216192]
Implicit and explicit feedback are prone to noise due to variability in human interactions.<n>In collaborative filtering, the reliability of interaction signals is critical, as these signals determine user and item similarities.<n>We analyze how repetition patterns intersect with key factors influencing user interest and develop methods to quantify the associated uncertainty.
arXiv Detail & Related papers (2025-05-05T09:18:47Z)
MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions. We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users. Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision. Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.