Conversational Behavior Modeling Foundation Model With Multi-Level Perception
- URL: http://arxiv.org/abs/2602.11065v1
- Date: Wed, 11 Feb 2026 17:32:52 GMT
- Title: Conversational Behavior Modeling Foundation Model With Multi-Level Perception
- Authors: Dingkun Zhou, Shuchang Pan, Jiachen Lian, Siddharth Banerjee, Sarika Pasumarthy, Dhruv Hebbar, Siddhant Patel, Zeyi Austin Li, Kan Jen Cheng, Sanay Bordia, Krish Patel, Akshaj Gupta, Tingle Li, Gopala Anumanchipalli,
- Abstract summary: We introduce framework models that predict intent and then reasons over conversational behaviors via a Graph-of-Thoughts (GoT)<n>GoT structures streaming predictions as an evolving graph, enabling a transformer to forecast the next speech act generate concise justifications for its decisions.<n>Experiments show that the framework delivers robust behavior detection, produces interpretable reasoning chains, and establishes a foundation for benchmarking conversational reasoning in full duplex spoken dialogue systems.
- Score: 13.659870465634228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this perceptual pathway is key to building natural full-duplex interactive systems. We introduce a framework that models this process as multi-level perception, and then reasons over conversational behaviors via a Graph-of-Thoughts (GoT). Our approach formalizes the intent-to-action pathway with a hierarchical labeling scheme, predicting high-level communicative intents and low-level speech acts to learn their causal and temporal dependencies. To train this system, we develop a high quality corpus that pairs controllable, event-rich dialogue data with human-annotated labels. The GoT framework structures streaming predictions as an evolving graph, enabling a transformer to forecast the next speech act, generate concise justifications for its decisions, and dynamically refine its reasoning. Experiments on both synthetic and real duplex dialogues show that the framework delivers robust behavior detection, produces interpretable reasoning chains, and establishes a foundation for benchmarking conversational reasoning in full duplex spoken dialogue systems.
Related papers
- F-Actor: Controllable Conversational Behaviour in Full-Duplex Models [70.48189107402145]
We present first open, instruction-following full-stage conversational speech model that can be trained efficiently under typical academic resource constraints.<n>Our model requires just 2,000 hours of data, without relying on large-scale pretraining or multi-stage pretraining.<n>Both the model and training code will be released to enable reproducible research on controllable full-like controllable full-stage speech systems.
arXiv Detail & Related papers (2026-01-16T14:25:57Z) - Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech [15.41279444168073]
We introduce framework enabling reasoning over conversational behaviors by modeling this process as causal inference within a Graph ofThoughts (GoT)<n>We develop a hybrid corpus that pairs controllable, event-rich simulations with humanannotated rationales and real conversational speech.<n>GoT framework structures streaming predictions as an evolving graph, enabling a multimodal transformer to forecast the next speech act.
arXiv Detail & Related papers (2025-12-25T15:00:50Z) - Chronological Thinking in Full-Duplex Spoken Dialogue Language Models [66.84843878538207]
Chronological Thinking aims to improve response quality in full SDLMs.<n>No additional latency: once the user stops speaking, the agent halts thinking and begins speaking without further delay.<n>Results: Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations.
arXiv Detail & Related papers (2025-10-02T10:28:11Z) - Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z) - Unsupervised Mutual Learning of Discourse Parsing and Topic Segmentation in Dialogue [37.618612723025784]
In dialogue systems, discourse plays a crucial role in managing conversational focus and coordinating interactions.<n>It consists of two key structures: rhetorical structure and topic structure.<n>We introduce a unified representation that integrates rhetorical and topic structures, ensuring semantic consistency between them.<n>We propose an unsupervised mutual learning framework (UMLF) that jointly models rhetorical and topic structures, allowing them to mutually reinforce each other without requiring additional annotations.
arXiv Detail & Related papers (2024-05-30T08:10:50Z) - Conversation Derailment Forecasting with Graph Convolutional Networks [6.251188655534379]
We propose a novel model based on a graph convolutional neural network that considers dialogue user dynamics and the influence of public perception on conversation utterances.
Our model effectively captures conversation dynamics and outperforms the state-of-the-art models on the CGA and CMV benchmark datasets by 1.5% and 1.7%, respectively.
arXiv Detail & Related papers (2023-06-22T15:40:59Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Discovering Dialog Structure Graph for Open-Domain Dialog Generation [51.29286279366361]
We conduct unsupervised discovery of dialog structure from chitchat corpora.
We then leverage it to facilitate dialog generation in downstream systems.
We present a Discrete Variational Auto-Encoder with Graph Neural Network (DVAE-GNN), to discover a unified human-readable dialog structure.
arXiv Detail & Related papers (2020-12-31T10:58:37Z) - DialogBERT: Discourse-Aware Response Generation via Learning to Recover
and Rank Utterances [18.199473005335093]
This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models.
To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression.
Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines.
arXiv Detail & Related papers (2020-12-03T09:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.