The Imitation Game According To Turing
- URL: http://arxiv.org/abs/2501.17629v1
- Date: Wed, 29 Jan 2025 13:08:17 GMT
- Title: The Imitation Game According To Turing
- Authors: Sharon Temtsin, Diane Proudfoot, David Kaber, Christoph Bartneck,
- Abstract summary: Recent studies have claimed that Large Language Models (LLMs) can pass the Turing Test-a goal for AI since the 1950s-and therefore can "think"
We conducted a rigorous Turing Test with GPT-4-Turbo that adhered closely to Turing's instructions for a three-player imitation game.
All but one participant correctly identified the LLM, showing that one of today's most advanced LLMs is unable to pass a rigorous Turing Test.
- Score: 0.0
- License:
- Abstract: The current cycle of hype and anxiety concerning the benefits and risks to human society of Artificial Intelligence is fuelled, not only by the increasing use of generative AI and other AI tools by the general public, but also by claims made on behalf of such technology by popularizers and scientists. In particular, recent studies have claimed that Large Language Models (LLMs) can pass the Turing Test-a goal for AI since the 1950s-and therefore can "think". Large-scale impacts on society have been predicted as a result. Upon detailed examination, however, none of these studies has faithfully applied Turing's original instructions. Consequently, we conducted a rigorous Turing Test with GPT-4-Turbo that adhered closely to Turing's instructions for a three-player imitation game. We followed established scientific standards where Turing's instructions were ambiguous or missing. For example, we performed a Computer-Imitates-Human Game (CIHG) without constraining the time duration and conducted a Man-Imitates-Woman Game (MIWG) as a benchmark. All but one participant correctly identified the LLM, showing that one of today's most advanced LLMs is unable to pass a rigorous Turing Test. We conclude that recent extravagant claims for such models are unsupported, and do not warrant either optimism or concern about the social impact of thinking machines.
Related papers
- Formal Mathematical Reasoning: A New Frontier in AI [60.26950681543385]
We advocate for formal mathematical reasoning and argue that it is indispensable for advancing AI4Math to the next level.
We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success.
arXiv Detail & Related papers (2024-12-20T17:19:24Z) - AI-Driven Agents with Prompts Designed for High Agreeableness Increase the Likelihood of Being Mistaken for a Human in the Turing Test [0.0]
GPT agents with varying levels of agreeableness were tested in a Turing Test.
All exceeded a 50% confusion rate, with the highly agreeable AI agent surpassing 60%.
This agent was also recognized as exhibiting the most human-like traits.
arXiv Detail & Related papers (2024-11-20T23:12:49Z) - Passed the Turing Test: Living in Turing Futures [0.0]
We are now living in one of many possible Turing futures where machines can pass for what they are not.
However, the learning machines that Turing imagined would pass his imitation tests were machines inspired by the natural development of the low-energy human cortex.
arXiv Detail & Related papers (2024-09-11T22:56:30Z) - People cannot distinguish GPT-4 from a human in a Turing test [0.913127392774573]
GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%)
Results have implications for debates around machine intelligence and, more urgently, suggest that deception by current AI systems may go undetected.
arXiv Detail & Related papers (2024-05-09T04:14:09Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - Does GPT-4 pass the Turing test? [0.913127392774573]
The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%)
We argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception.
arXiv Detail & Related papers (2023-10-31T06:27:52Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Can Machines Imitate Humans? Integrative Turing Tests for Vision and Language Demonstrate a Narrowing Gap [45.6806234490428]
We benchmark current AIs in their abilities to imitate humans in three language tasks and three vision tasks.
Experiments involved 549 human agents plus 26 AI agents for dataset creation, and 1,126 human judges plus 10 AI judges.
Results reveal that current AIs are not far from being able to impersonate humans in complex language and vision challenges.
arXiv Detail & Related papers (2022-11-23T16:16:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.