When Agents Persuade: Propaganda Generation and Mitigation in LLMs
- URL: http://arxiv.org/abs/2603.04636v1
- Date: Wed, 04 Mar 2026 21:56:29 GMT
- Title: When Agents Persuade: Propaganda Generation and Mitigation in LLMs
- Authors: Julia Jose, Ritik Roongta, Rachel Greenstadt,
- Abstract summary: LLMs can be exploited to produce manipulative material.<n>We analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda.<n>We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.
- Score: 2.1621083698499644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.
Related papers
- UnWEIRDing LLM Entity Recommendations [0.0]
We use the WEIRD framework to evaluate recommendations by various Large Language Models across a dataset of fine-grained entities.<n>Our results indicate that while such prompting strategies do reduce such biases, this reduction is not consistent across different models.
arXiv Detail & Related papers (2025-11-23T11:14:32Z) - Passing the Turing Test in Political Discourse: Fine-Tuning LLMs to Mimic Polarized Social Media Comments [0.0]
This study explores the extent to which fine-tuned large language models (LLMs) can replicate and amplify polarizing discourse.<n>Using a curated dataset of politically charged discussions extracted from Reddit, we fine-tune an open-source LLM to produce context-aware and ideologically aligned responses.<n>The results indicate that, when trained on partisan data, LLMs are capable of producing highly plausible and provocative comments, often indistinguishable from those written by humans.
arXiv Detail & Related papers (2025-06-17T15:41:26Z) - On the Adaptive Psychological Persuasion of Large Language Models [37.18479986426215]
We show that Large Language Models (LLMs) can autonomously persuade and resist persuasion.<n>We introduce eleven comprehensive psychological persuasion strategies.<n>We propose an adaptive framework that trains LLMs to autonomously select optimal strategies.
arXiv Detail & Related papers (2025-06-07T13:52:50Z) - Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z) - Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages [51.96666324242191]
We analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language.<n>We quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements.
arXiv Detail & Related papers (2025-02-13T17:49:30Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent [71.20471076045916]
Propaganda plays a critical role in shaping public opinion and fueling disinformation.<n>Propainsight systematically dissects propaganda into techniques, arousal appeals, and underlying intent.<n>Propagaze combines human-annotated data with high-quality synthetic data.
arXiv Detail & Related papers (2024-09-19T06:28:18Z) - Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in Legislation [23.309640920644565]
Large language models (LLMs) engage in subtle deception through strategically phrasing and intentionally manipulating information.<n>This study highlights the risk of LLMs' capabilities for strategic phrasing through seemingly neutral language to attain self-serving goals.
arXiv Detail & Related papers (2024-05-07T13:55:11Z) - See the Unseen: Better Context-Consistent Knowledge-Editing by Noises [73.54237379082795]
Knowledge-editing updates knowledge of large language models (LLMs)
Existing works ignore this property and the editing lacks generalization.
We empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.
arXiv Detail & Related papers (2024-01-15T09:09:14Z) - Boosting Large Language Model for Speech Synthesis: An Empirical Study [86.89548753080432]
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision.
We conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E.
We compare three integration methods between LLMs and speech models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder
arXiv Detail & Related papers (2023-12-30T14:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.