Related papers: Quantifying Feature Importance for Online Content Moderation

Quantifying Feature Importance for Online Content Moderation

URL: http://arxiv.org/abs/2510.19882v1
Date: Wed, 22 Oct 2025 14:02:30 GMT
Title: Quantifying Feature Importance for Online Content Moderation
Authors: Benedetta Tessa, Alejandro Moreo, Stefano Cresci, Tiziano Fagni, Fabrizio Sebastiani,
Abstract summary: We investigate the informativeness of socio-behavioural, linguistic, relational, and psychological features, in predicting behavioural changes of 16.8K users affected by a major moderation intervention on Reddit.<n>Our results pave the way for the development of accurate systems that predict user reactions to moderation interventions.
Score: 38.70422886875624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurately estimating how users respond to moderation interventions is paramount for developing effective and user-centred moderation strategies. However, this requires a clear understanding of which user characteristics are associated with different behavioural responses, which is the goal of this work. We investigate the informativeness of 753 socio-behavioural, linguistic, relational, and psychological features, in predicting the behavioural changes of 16.8K users affected by a major moderation intervention on Reddit. To reach this goal, we frame the problem in terms of "quantification", a task well-suited to estimating shifts in aggregate user behaviour. We then apply a greedy feature selection strategy with the double goal of (i) identifying the features that are most predictive of changes in user activity, toxicity, and participation diversity, and (ii) estimating their importance. Our results allow identifying a small set of features that are consistently informative across all tasks, and determining that many others are either task-specific or of limited utility altogether. We also find that predictive performance varies according to the task, with changes in activity and toxicity being easier to estimate than changes in diversity. Overall, our results pave the way for the development of accurate systems that predict user reactions to moderation interventions. Furthermore, our findings highlight the complexity of post-moderation user behaviour, and indicate that effective moderation should be tailored not only to user traits but also to the specific objective of the intervention.

Related papers

Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It [81.50711040539566]
Current large language model (LLM) development treats task-solving and preference alignment as separate challenges.<n>We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks.<n>Our framework creates scenarios where identical questions require different reasoning chains depending on user context.
arXiv Detail & Related papers (2025-09-30T18:55:28Z)
Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
Traits of a Leader: User Influence Level Prediction through Sociolinguistic Modeling [8.890331069484203]
We develop a model that significantly outperforms the baseline by leveraging demographic and personality data.<n>This approach consistently improves RankDCG scores across eight different domains.
arXiv Detail & Related papers (2025-01-05T22:37:19Z)
Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems. We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z)
Learning under Imitative Strategic Behavior with Unforeseeable Outcomes [14.80947863438795]
We propose a Stackelberg game to model the interplay between individuals and the decision-maker. We show that the objective difference between the two can be decomposed into three interpretable terms.
arXiv Detail & Related papers (2024-05-03T00:53:58Z)
Beyond Trial-and-Error: Predicting User Abandonment After a Moderation Intervention [0.6918368994425961]
Current content moderation follows a reactive, trial-and-error approach.<n>We introduce a proactive, predictive approach that enables moderators to anticipate the impact of their actions before implementation.<n>We study the reactions of 16,540 users to a massive ban of online communities on Reddit.
arXiv Detail & Related papers (2024-04-23T08:52:41Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics. Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z)
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z)
Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential. We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes. We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.