Does GPT-4 pass the Turing test?
- URL: http://arxiv.org/abs/2310.20216v2
- Date: Sat, 20 Apr 2024 20:47:28 GMT
- Title: Does GPT-4 pass the Turing test?
- Authors: Cameron R. Jones, Benjamin K. Bergen,
- Abstract summary: The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%)
We argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception.
- Score: 0.913127392774573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants' decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.
Related papers
- GPT-4 is judged more human than humans in displaced and inverted Turing tests [0.7437224586066946]
Everyday AI detection requires differentiating between people and AI in online conversations.
We measured how well people and large language models can discriminate using two modified versions of the Turing test: inverted and displaced.
arXiv Detail & Related papers (2024-07-11T20:28:24Z) - People cannot distinguish GPT-4 from a human in a Turing test [0.913127392774573]
GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%)
Results have implications for debates around machine intelligence and, more urgently, suggest that deception by current AI systems may go undetected.
arXiv Detail & Related papers (2024-05-09T04:14:09Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews [57.04431594769461]
This paper introduces a novel perspective to evaluate the personality fidelity of RPAs with psychological scales.
Experiments include various types of RPAs and LLMs, covering 32 distinct characters on 14 widely used psychological scales.
With InCharacter, we show that state-of-the-art RPAs exhibit personalities highly aligned with the human-perceived personalities of the characters, achieving an accuracy up to 80.7%.
arXiv Detail & Related papers (2023-10-27T08:42:18Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - ChatGPT: Jack of all trades, master of none [4.693597927153063]
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT)
We examined ChatGPT's capabilities on 25 diverse analytical NLP tasks.
We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses.
arXiv Detail & Related papers (2023-02-21T15:20:37Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Human or Machine? Turing Tests for Vision and Language [22.110556671410624]
We systematically benchmark current AIs in their abilities to imitate humans.
Experiments involved testing 769 human agents, 24 state-of-the-art AI agents, 896 human judges, and 8 AI judges.
Results reveal that current AIs are not far from being able to impersonate human judges across different genders, ages, and educational levels.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - CommonsenseQA 2.0: Exposing the Limits of AI through Gamification [126.85096257968414]
We construct benchmarks that test the abilities of modern natural language understanding models.
In this work, we propose gamification as a framework for data construction.
arXiv Detail & Related papers (2022-01-14T06:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.