Stephanie2: Thinking, Waiting, and Making Decisions Like Humans in Step-by-Step AI Social Chat
- URL: http://arxiv.org/abs/2601.05657v1
- Date: Fri, 09 Jan 2026 09:27:17 GMT
- Title: Stephanie2: Thinking, Waiting, and Making Decisions Like Humans in Step-by-Step AI Social Chat
- Authors: Hao Yang, Hongyuan Lu, Dingkang Yang, Wenliang Yang, Peng Sun, Xiaochuan Zhang, Jun Xiao, Kefan He, Wai Lam, Yang Liu, Xinhua Zeng,
- Abstract summary: Stephanie2 is a novel next-generation step-wise decision-making dialogue agent.<n>With active waiting and message-pace adaptation, Stephanie2 explicitly decides at each step whether to send or wait.<n> Experiments show that Stephanie2 clearly outperforms Stephanie1 on metrics such as naturalness and engagement.
- Score: 60.51107098103245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instant-messaging human social chat typically progresses through a sequence of short messages. Existing step-by-step AI chatting systems typically split a one-shot generation into multiple messages and send them sequentially, but they lack an active waiting mechanism and exhibit unnatural message pacing. In order to address these issues, we propose Stephanie2, a novel next-generation step-wise decision-making dialogue agent. With active waiting and message-pace adaptation, Stephanie2 explicitly decides at each step whether to send or wait, and models latency as the sum of thinking time and typing time to achieve more natural pacing. We further introduce a time-window-based dual-agent dialogue system to generate pseudo dialogue histories for human and automatic evaluations. Experiments show that Stephanie2 clearly outperforms Stephanie1 on metrics such as naturalness and engagement, and achieves a higher pass rate on human evaluation with the role identification Turing test.
Related papers
- Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction [32.28977425466535]
We conduct the first Turing test for S2S systems, collecting 2,968 human judgments on dialogues between 9 state-of-the-art S2S systems and 28 human participants.<n>No existing evaluated S2S system passes the test, revealing a significant gap in human-likeness.<n>We develop a fine-grained taxonomy of 18 human-likeness dimensions and crowd-annotate our collected dialogues accordingly.
arXiv Detail & Related papers (2026-02-27T15:15:31Z) - The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era [95.35748535806744]
We launch the first Human-like Spoken Dialogue Systems Challenge (HumDial) at ICASSP 2026.<n>This paper summarizes the dataset, track configurations, and the final results.
arXiv Detail & Related papers (2026-01-09T06:32:30Z) - Chronological Thinking in Full-Duplex Spoken Dialogue Language Models [66.84843878538207]
Chronological Thinking aims to improve response quality in full SDLMs.<n>No additional latency: once the user stops speaking, the agent halts thinking and begins speaking without further delay.<n>Results: Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations.
arXiv Detail & Related papers (2025-10-02T10:28:11Z) - X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents [56.64615470513102]
The Turing test examines whether AIs exhibit human-like behaviour in natural language conversations.<n>Traditional setting limits each participant to one message at a time and requires constant human participation.<n>This paper proposes textbftextscX-Turing, which enhances the original test with a textitburst dialogue pattern.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations [50.698517967337885]
We introduce a novel textbf-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations.
By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset.
Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm.
arXiv Detail & Related papers (2024-07-04T17:59:41Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - "Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior
Using Imagine-Then-Arbitrate Model [24.560203199376478]
In real human-human conversations, human often sequentially sends several short messages for readability instead of a long message in one turn.
We propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to help the agent decide whether to wait or to make a response directly.
arXiv Detail & Related papers (2020-02-22T04:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.