Related papers: Limits of Large Language Models in Debating Humans

Related papers

Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans [9.315735862658244]
We propose Human-Aligned Bench, a benchmark for alignment of multimodal reasoning with human performance.<n>We collected 9,794 multimodal questions that solely rely on contextual reasoning, including bilingual (Chinese and English) multimodal questions and pure text-based questions.<n>Extensive experiments on the Human-Aligned Bench reveal notable differences between the performance of current MLLMs in multimodal reasoning and human performance.
arXiv Detail & Related papers (2025-05-16T11:41:19Z)
AI persuading AI vs AI persuading Humans: LLMs' Differential Effectiveness in Promoting Pro-Environmental Behavior [70.24245082578167]
Pro-environmental behavior (PEB) is vital to combat climate change, yet turning awareness into intention and action remains elusive. We explore large language models (LLMs) as tools to promote PEB, comparing their impact across 3,200 participants. Results reveal a "synthetic persuasion paradox": synthetic and simulated agents significantly affect their post-intervention PEB stance, while human responses barely shift.
arXiv Detail & Related papers (2025-03-03T21:40:55Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment [0.9188951403098383]
Large Language Models (LLMs) are increasingly used as reasoning engines in agentic systems. We present the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. We employ the Animal-AI environment, a simulated 3D virtual laboratory, to study physical common-sense reasoning in LLMs.
arXiv Detail & Related papers (2024-10-30T17:28:28Z)
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina [7.155982875107922]
Studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates or simulations for humans in social science research. We assess the reasoning depth of LLMs using the 11-20 money request game.
arXiv Detail & Related papers (2024-10-25T14:46:07Z)
Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations. Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time. This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z)
Can Language Models Recognize Convincing Arguments? [12.458437450959416]
Large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives. We study their performance in detecting convincing arguments to gain insights into their persuasive capabilities.
arXiv Detail & Related papers (2024-03-31T17:38:33Z)
Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain [18.968232976619912]
We propose a "human-centered" modeling scheme for collaborative AI agents. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities. We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings.
arXiv Detail & Related papers (2024-01-28T05:05:57Z)
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion? [18.770522926093786]
Large Language Models have shown exceptional generative abilities in various natural language and generation tasks. We study a special application of ToM abilities that has higher stakes and possibly irreversible consequences. We focus on the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer.
arXiv Detail & Related papers (2024-01-10T18:09:36Z)
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems [53.94772445896213]
Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
arXiv Detail & Related papers (2024-01-08T15:01:08Z)
How should the advent of large language models affect the practice of science? [51.62881233954798]
How should the advent of large language models affect the practice of science? We have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate.
arXiv Detail & Related papers (2023-12-05T10:45:12Z)
The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents [7.986590413263814]
"Wisdom of partisan crowds" is a phenomenon known as the "wisdom of partisan crowds" We find that partisan crowds display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas.
arXiv Detail & Related papers (2023-11-16T08:30:15Z)
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. This position paper critically assesses three points recurring in critiques of LLM capacities. We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z)
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting. We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance. We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z)
Character-LLM: A Trainable Agent for Role-Playing [67.35139167985008]
Large language models (LLMs) can be used to serve as agents to simulate human behaviors. We introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc.
arXiv Detail & Related papers (2023-10-16T07:58:56Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data) This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs [0.0]
This study aims to investigate the performance of large language models (LLMs) on different reasoning tasks. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks.
arXiv Detail & Related papers (2023-03-22T22:53:44Z)
Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data. We train our agent partner as the best response to a population of self-play agents and their past checkpoints. We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z)
Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.