ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch
- URL: http://arxiv.org/abs/2506.03558v2
- Date: Thu, 25 Sep 2025 03:47:02 GMT
- Title: ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch
- Authors: Jiawei Chen, Xinyan Guan, Qianhao Yuan, Guozhao Mo, Weixiang Zhou, Yaojie Lu, Hongyu Lin, Ben He, Le Sun, Xianpei Han,
- Abstract summary: Skeleton-Guided Multi-Turn Dialogue Generation constrains multi-turn instruction synthesis by explicitly modeling human intent.<n>We construct ConsistentChat, a multi-turn instruction dataset with approximately 15,000 multi-turn conversations and 224,392 utterances.<n>Experiments on the Light, Topdial, and MT-Eval benchmarks show that models fine-tuned on ConsistentChat achieve a 20-30% improvement in chat consistency and up to a 15% increase in task success rate.
- Score: 79.12929103519922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current instruction data synthesis methods primarily focus on single-turn instructions and often neglect cross-turn coherence, resulting in context drift and reduced task completion rates in extended conversations. To address this limitation, we propose Skeleton-Guided Multi-Turn Dialogue Generation, a framework that constrains multi-turn instruction synthesis by explicitly modeling human conversational intent. It operates in two stages: (1) Intent Modeling, which captures the global structure of human dialogues by assigning each conversation to one of nine well-defined intent trajectories, ensuring a coherent and goal-oriented information flow; and (2) Skeleton Generation, which constructs a structurally grounded sequence of user queries aligned with the modeled intent, thereby serving as a scaffold that constrains and guides the downstream instruction synthesis process. Based on this process, we construct ConsistentChat, a multi-turn instruction dataset with approximately 15,000 multi-turn conversations and 224,392 utterances. Experiments on the Light, Topdial, and MT-Eval benchmarks show that models fine-tuned on ConsistentChat achieve a 20-30% improvement in chat consistency and up to a 15% increase in task success rate, significantly outperforming models trained on existing single-turn and multi-turn instruction datasets.
Related papers
- ChatUMM: Robust Context Tracking for Conversational Interleaved Generation [44.19929499646892]
Unified multimodal models (UMMs) have achieved remarkable progress yet remain constrained by a single-turn interaction paradigm.<n>We present ChatUMM, a conversational unified model that excels at robust context tracking to sustain interleaved multimodal generation.<n>ChatUMM derives its capabilities from an interleaved multi-turn training strategy that models serialized text-image streams as a continuous conversational flow.
arXiv Detail & Related papers (2026-02-06T07:11:50Z) - One Battle After Another: Probing LLMs' Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework [51.50565654314582]
Large language models can follow users' instructions throughout a dialogue spanning multiple topics.<n>Existing benchmarks are often limited to a fixed number of turns, making them susceptible to saturation and failing to account for the user's interactive experience.<n>We propose a framework for assessing multi-turn instruction-following ability.
arXiv Detail & Related papers (2025-11-05T14:39:59Z) - Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech [107.81472531864195]
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions.
We present Dynamic-SUPERB, a benchmark for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion.
arXiv Detail & Related papers (2023-09-18T06:43:30Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - TANet: Thread-Aware Pretraining for Abstractive Conversational
Summarization [27.185068253347257]
We build a large-scale (11M) pretraining dataset called RCS based on the multi-person discussions in the Reddit community.
We then present TANet, a thread-aware Transformer-based network.
Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency plays an essential role in understanding the entire conversation.
arXiv Detail & Related papers (2022-04-09T16:08:46Z) - Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations [22.894541507068933]
This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
arXiv Detail & Related papers (2022-03-08T12:26:57Z) - Structure-Aware Abstractive Conversation Summarization via Discourse and
Action Graphs [22.58861442978803]
We propose to explicitly model the rich structures in conversations for more precise and accurate conversation summarization.
We incorporate discourse relations between utterances and action triples in utterances through structured graphs to better encode conversations.
Experiments show that our proposed models outperform state-of-the-art methods and generalize well in other domains.
arXiv Detail & Related papers (2021-04-16T23:04:52Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - Video-Grounded Dialogues with Pretrained Generation Language Models [88.15419265622748]
We leverage the power of pre-trained language models for improving video-grounded dialogue.
We propose a framework by formulating sequence-to-grounded dialogue tasks as a sequence-to-grounded task.
Our framework allows fine-tuning language models to capture dependencies across multiple modalities.
arXiv Detail & Related papers (2020-06-27T08:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.