Related papers: Ideology-Based LLMs for Content Moderation

Ideology-Based LLMs for Content Moderation

URL: http://arxiv.org/abs/2510.25805v1
Date: Wed, 29 Oct 2025 06:22:50 GMT
Title: Ideology-Based LLMs for Content Moderation
Authors: Stefano Civelli, Pietro Bernardelle, Nardiena A. Pratama, Gianluca Demartini,
Abstract summary: Large language models (LLMs) are increasingly used in content moderation systems.<n>This study examines how persona adoption influences the consistency and fairness of harmful content classification.
Score: 3.8439345751986913
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly used in content moderation systems, where ensuring fairness and neutrality is essential. In this study, we examine how persona adoption influences the consistency and fairness of harmful content classification across different LLM architectures, model sizes, and content modalities (language vs. vision). At first glance, headline performance metrics suggest that personas have little impact on overall classification accuracy. However, a closer analysis reveals important behavioral shifts. Personas with different ideological leanings display distinct propensities to label content as harmful, showing that the lens through which a model "views" input can subtly shape its judgments. Further agreement analyses highlight that models, particularly larger ones, tend to align more closely with personas from the same political ideology, strengthening within-ideology consistency while widening divergence across ideological groups. To show this effect more directly, we conducted an additional study on a politically targeted task, which confirmed that personas not only behave more coherently within their own ideology but also exhibit a tendency to defend their perspective while downplaying harmfulness in opposing views. Together, these findings highlight how persona conditioning can introduce subtle ideological biases into LLM outputs, raising concerns about the use of AI systems that may reinforce partisan perspectives under the guise of neutrality.

Related papers

Us-vs-Them bias in Large Language Models [0.569978892646475]
We find consistent ingroup-positive and outgroup-negative associations across foundational large language models.<n>For personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity.
arXiv Detail & Related papers (2025-12-03T07:11:22Z)
Latent Topic Synthesis: Leveraging LLMs for Electoral Ad Analysis [51.95395936342771]
We introduce an end-to-end framework for automatically generating an interpretable topic taxonomy from an unlabeled corpus.<n>We apply this framework to a large corpus of Meta political ads from the month ahead of the 2024 U.S. Presidential election.<n>Our approach uncovers latent discourse structures, synthesizes semantically rich topic labels, and annotates topics with moral framing dimensions.
arXiv Detail & Related papers (2025-10-16T20:30:20Z)
Understanding and evaluating computer vision models through the lens of counterfactuals [2.2819712364325047]
This thesis develops frameworks that use counterfactuals to explain, audit, and mitigate bias in vision classifiers and generative models.<n>By systematically altering semantically meaningful attributes while holding others fixed, these methods uncover spurious correlations.<n>These contributions show counterfactuals as a unifying lens for interpretability, fairness, and causality in both discriminative and generative models.
arXiv Detail & Related papers (2025-08-28T15:11:49Z)
Political Ideology Shifts in Large Language Models [6.062377561249039]
We investigate how adopting synthetic personas influences ideological expression in large language models (LLMs)<n>Our analysis reveals four consistent patterns: (i) larger models display broader and more implicit ideological coverage; (ii) susceptibility to explicit ideological cues grows with scale; (iii) models respond more strongly to right-authoritarian than to left-libertarian priming; and (iv) thematic content in persona descriptions induces ideological shifts, which amplify with size.
arXiv Detail & Related papers (2025-08-22T00:16:38Z)
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning [63.25540801694765]
Large Language Models (LLMs) demonstrate striking linguistic abilities, yet whether they achieve this same balance remains unclear.<n>We apply the Information Bottleneck principle to quantitatively compare how LLMs and humans navigate this compression-meaning trade-off.
arXiv Detail & Related papers (2025-05-21T16:29:00Z)
Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models [66.5536396328527]
LLMs inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups.<n>We propose Fairness Mediator (FairMed), a bias mitigation framework that neutralizes stereotype associations.<n>Our framework comprises two main components: a stereotype association prober and an adversarial debiasing neutralizer.
arXiv Detail & Related papers (2025-04-10T14:23:06Z)
"Whose Side Are You On?" Estimating Ideology of Political and News Content Using Large Language Models and Few-shot Demonstration Selection [5.277598111323805]
Existing approaches to classifying ideology are limited in that they require extensive human effort, the labeling of large datasets, and are not able to adapt to evolving ideological contexts.<n>This paper explores the potential of Large Language Models for classifying the political ideology of online content in the context of the two-party US political spectrum through in-context learning.
arXiv Detail & Related papers (2025-03-23T02:32:25Z)
Beyond Partisan Leaning: A Comparative Analysis of Political Bias in Large Language Models [6.549047699071195]
This study adopts a persona-free, topic-specific approach to evaluate political behavior in large language models.<n>We analyze responses from 43 large language models developed in the U.S., Europe, China, and the Middle East.<n>Findings show most models lean center-left or left ideologically and vary in their nonpartisan engagement patterns.
arXiv Detail & Related papers (2024-12-21T19:42:40Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics. Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues. The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness. We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.