On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial
- URL: http://arxiv.org/abs/2403.14380v1
- Date: Thu, 21 Mar 2024 13:14:40 GMT
- Title: On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial
- Authors: Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, Robert West,
- Abstract summary: We analyze the effect of AI-driven persuasion in a controlled, harmless setting.
We found that participants who debated GPT-4 with access to their personal information had 81.7% higher odds of increased agreement with their opponents compared to participants who debated humans.
- Score: 10.770999939834985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLMs' persuasive capabilities in direct conversations with human counterparts and how personalization can improve their performance. In this pre-registered study, we analyze the effect of AI-driven persuasion in a controlled, harmless setting. We create a web-based platform where participants engage in short, multiple-round debates with a live opponent. Each participant is randomly assigned to one of four treatment conditions, corresponding to a two-by-two factorial design: (1) Games are either played between two humans or between a human and an LLM; (2) Personalization might or might not be enabled, granting one of the two players access to basic sociodemographic information about their opponent. We found that participants who debated GPT-4 with access to their personal information had 81.7% (p < 0.01; N=820 unique participants) higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperforms humans, but the effect is lower and statistically non-significant (p=0.31). Overall, our results suggest that concerns around personalization are meaningful and have important implications for the governance of social media and the design of new online environments.
Related papers
- Persona Knowledge-Aligned Prompt Tuning Method for Online Debate [42.28019112668135]
We propose a persona knowledge-aligned framework for argument quality assessment tasks from the audience side.
This is the first work that leverages the emergence of ChatGPT and injects audience personae knowledge into smaller language models via prompt tuning.
arXiv Detail & Related papers (2024-10-05T17:33:11Z) - Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Large Language Models Can Infer Personality from Free-Form User Interactions [0.0]
GPT-4 can infer personality with moderate accuracy, outperforming previous approaches.
Results show that the direct focus on personality assessment did not result in a less positive user experience.
Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups.
arXiv Detail & Related papers (2024-05-19T20:33:36Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - LLMs Among Us: Generative AI Participating in Digital Discourse [0.0]
"LLMs Among Us" is an experimental framework for bot and human participants to communicate without knowing the ratio or nature of bot and human participants.
We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection.
We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans.
arXiv Detail & Related papers (2024-02-08T19:21:33Z) - Can ChatGPT Read Who You Are? [10.577227353680994]
We report the results of a comprehensive user study featuring texts written in Czech by a representative population sample of 155 participants.
We compare the personality trait estimations made by ChatGPT against those by human raters and report ChatGPT's competitive performance in inferring personality traits from text.
arXiv Detail & Related papers (2023-12-26T14:43:04Z) - Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning [67.33899440998175]
Chain-of-Opinion (COO) is a simple four-step solution modeling which and how to reason with personae.
COO distinguishes between explicit personae (demographics and ideology) and implicit personae (historical opinions)
COO efficiently achieves new state-of-the-art opinion prediction via prompting with only 5 inference calls, improving prior techniques by up to 4%.
arXiv Detail & Related papers (2023-11-14T18:48:27Z) - Locally Differentially Private Document Generation Using Zero Shot
Prompting [61.20953109732442]
We propose a locally differentially private mechanism called DP-Prompt to counter author de-anonymization attacks.
When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks.
arXiv Detail & Related papers (2023-10-24T18:25:13Z) - Do Large Language Models Show Decision Heuristics Similar to Humans? A
Case Study Using GPT-3.5 [0.0]
GPT-3.5 is an example of an LLM that supports a conversational agent called ChatGPT.
In this work, we used a series of novel prompts to determine whether ChatGPT shows biases, and other decision effects.
We also tested the same prompts on human participants.
arXiv Detail & Related papers (2023-05-08T01:02:52Z) - Partner Matters! An Empirical Study on Fusing Personas for Personalized
Response Selection in Retrieval-Based Chatbots [51.091235903442715]
This paper makes an attempt to explore the impact of utilizing personas that describe either self or partner speakers on the task of response selection.
Four persona fusion strategies are designed, which assume personas interact with contexts or responses in different ways.
Empirical studies on the Persona-Chat dataset show that the partner personas can improve the accuracy of response selection.
arXiv Detail & Related papers (2021-05-19T10:32:30Z) - M2P2: Multimodal Persuasion Prediction using Adaptive Fusion [65.04045695380333]
This paper solves two problems: the Debate Outcome Prediction (DOP) problem predicts who wins a debate and the Intensity of Persuasion Prediction (IPP) problem predicts the change in the number of votes before and after a speaker speaks.
Our M2P2 framework is the first to use multimodal (acoustic, visual, language) data to solve the IPP problem.
arXiv Detail & Related papers (2020-06-03T18:47:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.