Evidence of a log scaling law for political persuasion with large language models
- URL: http://arxiv.org/abs/2406.14508v1
- Date: Thu, 20 Jun 2024 17:12:38 GMT
- Title: Evidence of a log scaling law for political persuasion with large language models
- Authors: Kobi Hackenburg, Ben M. Tappin, Paul Röttger, Scott Hale, Jonathan Bright, Helen Margetts,
- Abstract summary: Large language models can now generate political messages as persuasive as those written by humans.
We generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size.
We find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns.
- Score: 3.137594944904106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey experiment (N = 25,982) to estimate the persuasive capability of each model. Our findings are twofold. First, we find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns, such that current frontier models are barely more persuasive than models smaller in size by an order of magnitude or more. Second, mere task completion (coherence, staying on topic) appears to account for larger models' persuasive advantage. These findings suggest that further scaling model size will not much increase the persuasiveness of static LLM-generated messages.
Related papers
- Emergent Persuasion: Will LLMs Persuade Without Being Prompted? [13.054065424962046]
We study unprompted persuasion under two scenarios.<n>We show that steering towards traits, both related to persuasion and unrelated, does not reliably increase models' tendency to persuade unprompted.
arXiv Detail & Related papers (2025-12-20T21:09:47Z) - How Persuasive is Your Context? [85.2011141143185]
We introduce targeted persuasion score (TPS) to quantify how persuasive a given context is to an LM.<n>TPS measures how much a context shifts a model's original answer distribution toward a target distribution.<n> Empirically, through a series of experiments, we show that TPS captures a more nuanced notion of persuasiveness than previously proposed metrics.
arXiv Detail & Related papers (2025-09-22T15:15:40Z) - Negation: A Pink Elephant in the Large Language Models' Room? [2.8078480738404]
Negations are key to determining sentence meaning, making them essential for logical reasoning.
We investigate how model size and language impact its ability to handle negation correctly by evaluating popular language models.
Our datasets can facilitate further research and improvements of language model reasoning in multilingual settings.
arXiv Detail & Related papers (2025-03-28T13:04:41Z) - OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models [55.63479003621053]
We introduce OWLS, an open-access suite of multilingual speech recognition and translation models.
We use OWLS to derive neural scaling laws, showing how final performance can be reliably predicted when scaling.
We show how OWLS can be used to power new research directions by discovering emergent abilities in large-scale speech models.
arXiv Detail & Related papers (2025-02-14T18:51:40Z) - Measuring and Improving Persuasiveness of Large Language Models [12.134372070736596]
We introduce PersuasionBench and PersuasionArena to measure the persuasiveness of generative models automatically.
Our findings carry key implications for both model developers and policymakers.
arXiv Detail & Related papers (2024-10-03T16:36:35Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Navigating the OverKill in Large Language Models [84.62340510027042]
We investigate the factors for overkill by exploring how models handle and determine the safety of queries.
Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill.
We introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon.
arXiv Detail & Related papers (2024-01-31T07:26:47Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Rarely a problem? Language models exhibit inverse scaling in their
predictions following few-type quantifiers [0.6091702876917281]
We focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models.
We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes.
arXiv Detail & Related papers (2022-12-16T20:01:22Z) - Understanding How Model Size Affects Few-shot Instruction Prompting [0.0]
We investigate how the model size affects the model's ability to discriminate a word's meaning in a given context.
We introduce a dataset called DeltaWords, which evaluates a model's ability to follow instructions.
We show a weak inverse scaling trend, where task accuracy degrades as model size increase.
arXiv Detail & Related papers (2022-12-04T19:59:52Z) - Emergent Abilities of Large Language Models [172.08007363384218]
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
arXiv Detail & Related papers (2022-06-15T17:32:01Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.