Limits of Large Language Models in Debating Humans
- URL: http://arxiv.org/abs/2402.06049v2
- Date: Sat, 01 Feb 2025 23:54:28 GMT
- Title: Limits of Large Language Models in Debating Humans
- Authors: James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, Colton Mikolajczyk,
- Abstract summary: We rigorously test the limits of agents that debate using Large Language Models (LLMs)
We found that agents can blend in and concentrate on a debate's topic better than humans, improving the productivity of all players.
Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other.
- Score: 0.0
- License:
- Abstract: Large Language Models (LLMs) have shown remarkable promise in communicating with humans. Their potential use as artificial partners with humans in sociological experiments involving conversation is an exciting prospect. But how viable is it? Here, we rigorously test the limits of agents that debate using LLMs in a preregistered study that runs multiple debate-based opinion consensus games. Each game starts with six humans, six agents, or three humans and three agents. We found that agents can blend in and concentrate on a debate's topic better than humans, improving the productivity of all players. Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other. We observed that agents are already decent debaters, but their behavior generates a pattern distinctly different from the human-generated data.
Related papers
- Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Enhancing Human Experience in Human-Agent Collaboration: A
Human-Centered Modeling Approach Based on Positive Human Gain [18.968232976619912]
We propose a "human-centered" modeling scheme for collaborative AI agents.
We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities.
We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings.
arXiv Detail & Related papers (2024-01-28T05:05:57Z) - Theory of Mind abilities of Large Language Models in Human-Robot
Interaction : An Illusion? [18.770522926093786]
Large Language Models have shown exceptional generative abilities in various natural language and generation tasks.
We study a special application of ToM abilities that has higher stakes and possibly irreversible consequences.
We focus on the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer.
arXiv Detail & Related papers (2024-01-10T18:09:36Z) - SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems [53.94772445896213]
Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society.
We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
arXiv Detail & Related papers (2024-01-08T15:01:08Z) - The Wisdom of Partisan Crowds: Comparing Collective Intelligence in
Humans and LLM-based Agents [7.986590413263814]
"Wisdom of partisan crowds" is a phenomenon known as the "wisdom of partisan crowds"
We find that partisan crowds display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do.
We identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas.
arXiv Detail & Related papers (2023-11-16T08:30:15Z) - Character-LLM: A Trainable Agent for Role-Playing [67.35139167985008]
Large language models (LLMs) can be used to serve as agents to simulate human behaviors.
We introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc.
arXiv Detail & Related papers (2023-10-16T07:58:56Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.