Self-Directed Turing Test for Large Language Models
- URL: http://arxiv.org/abs/2408.09853v1
- Date: Mon, 19 Aug 2024 09:57:28 GMT
- Title: Self-Directed Turing Test for Large Language Models
- Authors: Weiqi Wu, Hongqiu Wu, Hai Zhao,
- Abstract summary: The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
- Score: 56.64615470513102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations. Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time and require continuous human involvement to direct the entire interaction with the test subject. This fails to reflect a natural conversational style and hinders the evaluation of Large Language Models (LLMs) in complex and prolonged dialogues. This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format, allowing more dynamic exchanges by multiple consecutive messages. It further efficiently reduces human workload by having the LLM self-direct the majority of the test process, iteratively generating dialogues that simulate its interaction with humans. With the pseudo-dialogue history, the model then engages in a shorter dialogue with a human, which is paired with a human-human conversation on the same topic to be judged using questionnaires. We introduce the X-Turn Pass-Rate metric to assess the human likeness of LLMs across varying durations. While LLMs like GPT-4 initially perform well, achieving pass rates of 51.9% and 38.9% during 3 turns and 10 turns of dialogues respectively, their performance drops as the dialogue progresses, which underscores the difficulty in maintaining consistency in the long term.
Related papers
- Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset [28.076028584051617]
We introduce NICO, a Natural Interactive COnversation dataset in Chinese.
We first use GPT-4-turbo to generate dialogue drafts and make them cover 20 daily-life topics and 5 types of social interactions.
We define two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences.
arXiv Detail & Related papers (2024-08-18T02:06:25Z) - Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations [50.698517967337885]
We introduce a novel textbf-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations.
By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset.
Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm.
arXiv Detail & Related papers (2024-07-04T17:59:41Z) - LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models [4.283022729693451]
We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting.
We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, curated using our proposed pipeline.
Experimental results indicate that while pre-trained models and those fine-tuned with a chit-chat dataset struggle to generate responses reflecting personality, models trained with PSYDIAL show significant improvements.
arXiv Detail & Related papers (2024-04-01T05:19:34Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning.
Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement.
Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z) - TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented
Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling.
To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.