A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?
- URL: http://arxiv.org/abs/2312.00798v2
- Date: Mon, 1 Jan 2024 18:43:29 GMT
- Title: A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?
- Authors: Qiaozhu Mei, Yutong Xie, Walter Yuan, Matthew O. Jackson
- Abstract summary: ChatGPT-4 exhibits and personality traits that are statistically indistinguishable from a random human subjects.
Their behaviors are often distinct from average and human behaviors.
We estimate that they act as if they are maximizing an average of their own and partner's payoffs.
- Score: 19.50537882161282
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in
a suite of classic behavioral games that are designed to elicit characteristics
such as trust, fairness, risk-aversion, cooperation, \textit{etc.}, as well as
how they respond to a traditional Big-5 psychological survey that measures
personality traits. ChatGPT-4 exhibits behavioral and personality traits that
are statistically indistinguishable from a random human from tens of thousands
of human subjects from more than 50 countries. Chatbots also modify their
behavior based on previous experience and contexts ``as if'' they were learning
from the interactions, and change their behavior in response to different
framings of the same strategic situation. Their behaviors are often distinct
from average and modal human behaviors, in which case they tend to behave on
the more altruistic and cooperative end of the distribution. We estimate that
they act as if they are maximizing an average of their own and partner's
payoffs.
Related papers
- Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews [23.443181324643017]
This study examines the impact of AI on human false memories.
It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews.
arXiv Detail & Related papers (2024-08-08T04:55:03Z) - Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.
We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.
Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Human or Not? A Gamified Approach to the Turing Test [11.454575816255039]
We present "Human or Not?", an online game inspired by the Turing test.
The game was played by over 1.5 million users over a month.
Overall users guessed the identity of their partners correctly in only 68% of the games.
In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60%.
arXiv Detail & Related papers (2023-05-31T16:32:22Z) - Towards Healthy AI: Large Language Models Need Therapists Too [41.86344997530743]
We define Healthy AI to be safe, trustworthy and ethical.
We present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors.
arXiv Detail & Related papers (2023-04-02T00:39:12Z) - EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner.
We propose a method based on a transformer pretrained language model (T5)
We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Spot The Bot: A Robust and Efficient Framework for the Evaluation of
Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots.
Human judges only annotate for each entity in a conversation whether they think it is human or not.
emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.