Among Them: A game-based framework for assessing persuasion capabilities of LLMs
- URL: http://arxiv.org/abs/2502.20426v1
- Date: Thu, 27 Feb 2025 12:26:21 GMT
- Title: Among Them: A game-based framework for assessing persuasion capabilities of LLMs
- Authors: Mateusz Idziejczak, Vasyl Korzavatykh, Mateusz Stawicki, Andrii Chmutov, Marcin Korcz, Iwo Błądek, Dariusz Brzezinski,
- Abstract summary: Large language models (LLMs) and autonomous AI agents have raised concerns about their potential for automated persuasion and social influence.<n>We present an Among Us-inspired game framework for assessing LLM deception skills in a controlled environment.
- Score: 0.8763629723457529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of large language models (LLMs) and autonomous AI agents has raised concerns about their potential for automated persuasion and social influence. While existing research has explored isolated instances of LLM-based manipulation, systematic evaluations of persuasion capabilities across different models remain limited. In this paper, we present an Among Us-inspired game framework for assessing LLM deception skills in a controlled environment. The proposed framework makes it possible to compare LLM models by game statistics, as well as quantify in-game manipulation according to 25 persuasion strategies from social psychology and rhetoric. Experiments between 8 popular language models of different types and sizes demonstrate that all tested models exhibit persuasive capabilities, successfully employing 22 of the 25 anticipated techniques. We also find that larger models do not provide any persuasion advantage over smaller models and that longer model outputs are negatively correlated with the number of games won. Our study provides insights into the deception capabilities of LLMs, as well as tools and data for fostering future research on the topic.
Related papers
- V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models [84.27290155010533]
V-MAGE is a game-based evaluation framework designed to assess visual reasoning capabilities of MLLMs.
We use V-MAGE to evaluate leading MLLMs, revealing significant challenges in their visual perception and reasoning.
arXiv Detail & Related papers (2025-04-08T15:43:01Z) - Towards Reasoning Ability of Small Language Models [3.732224317444325]
We show that small language models (SLMs) can achieve competitive reasoning performance.
We systematically survey, benchmark, and analyze 72 SLMs from six model families across 14 reasoning benchmarks.
Our findings challenge the assumption that scaling is the only way to achieve strong reasoning.
arXiv Detail & Related papers (2025-02-17T08:59:16Z) - WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis [34.639887462203]
We introduce an open, scalable, and real-time updated platform for accessing and analyzing the LLM-based MAS based on the games Who is Spy?" (WiS)<n>Our platform is featured with three main worths: (1) a unified model evaluate interface that supports models available on H Face; (2) real-time updated leaderboard for model evaluation; and (3) a comprehensive evaluation covering game-winning rates, attacking, defense strategies, and reasoning of LLMs.
arXiv Detail & Related papers (2024-12-04T14:45:09Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Measuring and Improving Persuasiveness of Large Language Models [12.134372070736596]
We introduce PersuasionBench and PersuasionArena to measure the persuasiveness of generative models automatically.
Our findings carry key implications for both model developers and policymakers.
arXiv Detail & Related papers (2024-10-03T16:36:35Z) - Persuasion Games using Large Language Models [0.0]
Large Language Models (LLMs) have emerged as formidable instruments capable of comprehending and producing human-like text.
This paper explores the potential of LLMs, to shape user perspectives and subsequently influence their decisions on particular tasks.
This capability finds applications in diverse domains such as Investment, Credit cards and Insurance.
arXiv Detail & Related papers (2024-08-28T15:50:41Z) - Can formal argumentative reasoning enhance LLMs performances? [0.3659498819753633]
We present a pipeline (MQArgEng) to evaluate the effect of introducing computational argumentation semantics on the performance of Large Language Models (LLMs)
Exploratory results indicate that MQArgEng provides a moderate performance gain in most of the examined topical categories and, as such, show promise and warrant further research.
arXiv Detail & Related papers (2024-05-16T22:09:31Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - Multimodal Large Language Models to Support Real-World Fact-Checking [80.41047725487645]
Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information.
While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied.
We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
arXiv Detail & Related papers (2024-03-06T11:32:41Z) - CogBench: a large language model walks into a psychology lab [12.981407327149679]
This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments.
We apply CogBench to 35 large language models (LLMs) and analyze this data using statistical multilevel modeling techniques.
We find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior.
arXiv Detail & Related papers (2024-02-28T10:43:54Z) - GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs)
GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms.
We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.