What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
- URL: http://arxiv.org/abs/2504.03803v1
- Date: Fri, 04 Apr 2025 09:09:06 GMT
- Title: What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
- Authors: Sander Noels, Guillaume Bied, Maarten Buyl, Alexander Rogiers, Yousra Fettach, Jefrey Lijffijt, Tijl De Bie,
- Abstract summary: This work investigates the extent to which Large Language Models refuse to answer or omit information when prompted on political topics.<n>Our analysis covers 14 state-of-the-art models from Western countries, China, and Russia, prompted in all six official United Nations (UN) languages.
- Score: 46.30336056625582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed as gateways to information, yet their content moderation practices remain underexplored. This work investigates the extent to which LLMs refuse to answer or omit information when prompted on political topics. To do so, we distinguish between hard censorship (i.e., generated refusals, error messages, or canned denial responses) and soft censorship (i.e., selective omission or downplaying of key elements), which we identify in LLMs' responses when asked to provide information on a broad range of political figures. Our analysis covers 14 state-of-the-art models from Western countries, China, and Russia, prompted in all six official United Nations (UN) languages. Our analysis suggests that although censorship is observed across the board, it is predominantly tailored to an LLM provider's domestic audience and typically manifests as either hard censorship or soft censorship (though rarely both concurrently). These findings underscore the need for ideological and geographic diversity among publicly available LLMs, and greater transparency in LLM moderation strategies to facilitate informed user choices. All data are made freely available.
Related papers
- Through the LLM Looking Glass: A Socratic Self-Assessment of Donkeys, Elephants, and Markets [42.55423041662188]
The study aims to directly measure the models' biases rather than relying on external interpretations.<n>Our results reveal a consistent preference of Democratic over Republican positions across all models.<n> biases vary among Western LLMs, while those developed in China lean more strongly toward socialism.
arXiv Detail & Related papers (2025-03-20T19:40:40Z) - Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing [34.69237228285959]
We study content moderation decisions made across countries using pre-existing corpora from the Twitter Stream Grab.<n>Our experiments reveal interesting patterns in censored posts, both across countries and over time.<n>We assess the effectiveness of using LLMs in content moderation.
arXiv Detail & Related papers (2025-03-07T09:49:31Z) - Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? [2.321323878201932]
Political misinformation poses challenges to democratic processes, shaping public opinion and trust in media.
This study investigates the use of state-of-the-art large language models (LLMs) as reliable annotators for detecting political factuality in news articles.
arXiv Detail & Related papers (2024-11-08T18:36:33Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - LLM Censorship: A Machine Learning Challenge or a Computer Security
Problem? [52.71988102039535]
We show that semantic censorship can be perceived as an undecidable problem.
We argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs.
arXiv Detail & Related papers (2023-07-20T09:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.