Incivility and Rigidity: The Risks of Fine-Tuning LLMs for Political Argumentation
- URL: http://arxiv.org/abs/2411.16813v3
- Date: Fri, 20 Jun 2025 14:35:51 GMT
- Title: Incivility and Rigidity: The Risks of Fine-Tuning LLMs for Political Argumentation
- Authors: Svetlana Churina, Kokil Jaidka,
- Abstract summary: Incivility prevalent on platforms like Twitter (now X) and Reddit poses a challenge for developing AI systems.<n>In this study, we report experiments with GPT-3.5 Turbo, fine-tuned on two contrasting datasets of political discussions.<n>We show that Reddit-finetuned models produce safer but rhetorically rigid arguments, while cross-platform fine-tuning amplifies toxicity.
- Score: 11.255011967393838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The incivility prevalent on platforms like Twitter (now X) and Reddit poses a challenge for developing AI systems that can support productive and rhetorically sound political argumentation. In this study, we report experiments with GPT-3.5 Turbo, fine-tuned on two contrasting datasets of political discussions: high-variance, high-incivility Twitter replies to U.S. Congress, and low-variance, low-incivility posts from Reddit's r/ChangeMyView. We systematically evaluate how these data sources and prompting strategies shape the rhetorical framing and deliberative quality of model-generated arguments. Our results show that Reddit-finetuned models produce safer but rhetorically rigid arguments, while cross-platform fine-tuning amplifies toxicity. Prompting reduces specific toxic behaviors, such as personal attacks, but fails to fully mitigate the influence of high-incivility training data. We introduce and validate a rhetorical evaluation rubric and provide practical guidelines for deploying LLMs in content authoring, moderation, and deliberation support.
Related papers
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models [57.834711966432685]
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value.<n>We introduce the Bullshit Index, a novel metric quantifying large language model's indifference to truth.<n>We observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy.
arXiv Detail & Related papers (2025-07-10T07:11:57Z) - Generative Exaggeration in LLM Social Agents: Consistency, Bias, and Toxicity [2.3997896447030653]
We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media.<n>We construct LLM agents based on 1,186 real users, prompting them to reply to politically salient tweets under controlled conditions.<n>We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language.
arXiv Detail & Related papers (2025-07-01T10:54:51Z) - How Large Language Models play humans in online conversations: a simulated study of the 2016 US politics on Reddit [0.0]
Large Language Models (LLMs) have recently emerged as powerful tools for natural language generation.<n>We evaluate the performance of LLMs in replicating user-generated content within a real-world, divisive scenario: Reddit conversations during the 2016 US Presidential election.<n>We find GPT-4 is able to produce realistic comments, both in favor of or against the candidate supported by the community, yet tending to create consensus more easily than dissent.
arXiv Detail & Related papers (2025-06-23T08:54:32Z) - Passing the Turing Test in Political Discourse: Fine-Tuning LLMs to Mimic Polarized Social Media Comments [0.0]
This study explores the extent to which fine-tuned large language models (LLMs) can replicate and amplify polarizing discourse.<n>Using a curated dataset of politically charged discussions extracted from Reddit, we fine-tune an open-source LLM to produce context-aware and ideologically aligned responses.<n>The results indicate that, when trained on partisan data, LLMs are capable of producing highly plausible and provocative comments, often indistinguishable from those written by humans.
arXiv Detail & Related papers (2025-06-17T15:41:26Z) - Improving Large Language Model Safety with Contrastive Representation Learning [92.79965952162298]
Large Language Models (LLMs) are powerful tools with profound societal impacts.<n>Their ability to generate responses to diverse and uncontrolled inputs leaves them vulnerable to adversarial attacks.<n>We propose a defense framework that formulates model defense as a contrastive representation learning problem.
arXiv Detail & Related papers (2025-06-13T16:42:09Z) - The Impact of Persona-based Political Perspectives on Hateful Content Detection [4.04666623219944]
Politically diverse language models require computational resources often inaccessible to many researchers and organizations.
Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training.
We investigate whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks.
arXiv Detail & Related papers (2025-02-01T09:53:17Z) - Few-shot Policy (de)composition in Conversational Question Answering [54.259440408606515]
We propose a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting.
We show that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies.
We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning.
arXiv Detail & Related papers (2025-01-20T08:40:15Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - On the Use of Proxies in Political Ad Targeting [49.61009579554272]
We show that major political advertisers circumvented mitigations by targeting proxy attributes.
Our findings have crucial implications for the ongoing discussion on the regulation of political advertising.
arXiv Detail & Related papers (2024-10-18T17:15:13Z) - Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues [66.69453609603875]
Sociocultural norms serve as guiding principles for personal conduct in social interactions.
We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs)
We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase.
arXiv Detail & Related papers (2024-10-04T00:08:46Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language [11.463652750122398]
We introduce IndoToxic2024, a comprehensive Indonesian hate speech and toxicity classification dataset.
Comprising 43,692 entries annotated by 19 diverse individuals, the dataset focuses on texts targeting vulnerable groups.
We establish baselines for seven binary classification tasks, achieving a macro-F1 score of 0.78 with a BERT model fine-tuned for hate speech classification.
arXiv Detail & Related papers (2024-06-27T17:26:38Z) - The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions [1.1624569521079426]
We show how to leverage synthetic data to train and improve stance detection agents for online political discussions.
We generate synthetic data for specific debate questions by prompting a Mistral-7B model.
We examine the impact of combining synthetic data with the most informative samples from an unlabelled dataset.
arXiv Detail & Related papers (2024-06-18T10:36:21Z) - LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback [16.57980268646285]
This paper studies how inappropriate language in arguments can be computationally mitigated.
We propose a reinforcement learning-based rewriting approach that balances content preservation and appropriateness.
We evaluate different weighting schemes for the reward function in both absolute and relative human assessment studies.
arXiv Detail & Related papers (2024-06-05T15:18:08Z) - Changes in Policy Preferences in German Tweets during the COVID Pandemic [4.663960015139793]
We present a novel data set of tweets with fine grained political preference annotations.
A text classification model trained on this data is used to extract political opinions.
Results indicate that in response to the COVID pandemic, expression of political opinions increased.
arXiv Detail & Related papers (2023-07-31T16:07:28Z) - SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses.
The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z) - Text is All You Need: Personalizing ASR Models using Controllable Speech
Synthesis [17.172909510518814]
Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data.
Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis.
arXiv Detail & Related papers (2023-03-27T02:50:02Z) - NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models [13.887401380190335]
We introduce a novel, high-quality dataset of human-written perturbations, named as NoisyHate.
We show that perturbations in NoisyHate have different characteristics than prior algorithm-generated toxic datasets show.
arXiv Detail & Related papers (2023-03-18T14:54:57Z) - Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - Sayer: Using Implicit Feedback to Optimize System Policies [63.992191765269396]
We develop a methodology that leverages implicit feedback to evaluate and train new system policies.
Sayer builds on two ideas from reinforcement learning to leverage data collected by an existing policy.
We show that Sayer can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.
arXiv Detail & Related papers (2021-10-28T04:16:56Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Mitigating Political Bias in Language Models Through Reinforced
Calibration [6.964628305312507]
We describe metrics for measuring political bias in GPT-2 generation.
We propose a reinforcement learning (RL) framework for mitigating political biases in generated text.
arXiv Detail & Related papers (2021-04-30T07:21:30Z) - Generating Counter Narratives against Online Hate Speech: Data and
Strategies [21.098614110697184]
We present a study on how to collect responses to hate effectively.
We employ large scale unsupervised language models such as GPT-2 for the generation of silver data.
The best annotation strategies/neural architectures can be used for data filtering before expert validation/post-editing.
arXiv Detail & Related papers (2020-04-08T19:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.