D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
- URL: http://arxiv.org/abs/2510.13363v1
- Date: Wed, 15 Oct 2025 09:53:11 GMT
- Title: D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
- Authors: Xiang Lei, Qin Li, Min Zhang, Min Zhang,
- Abstract summary: Large Language Models (LLMs) often exhibit factual inconsistencies and logical decay in extended, multi-turn dialogues.<n>We propose D--101, a model-agnostic framework designed to maintain multi-turn dialogue consistency.<n>We introduce new NLI-based metrics to better measure multiturn dialogue consistency.
- Score: 22.420810089099614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) often exhibit factual inconsistencies and logical decay in extended, multi-turn dialogues, a challenge stemming from their reliance on static, pre-trained knowledge and an inability to reason adaptively over the dialogue history. Prevailing mitigation strategies, such as Retrieval-Augmented Generation (RAG) and agentic working memories, improve information recall but still engage with fundamentally static knowledge sources and follow pre-defined single reasoning path. This hinders their ability to preserve factual and logical consistency of their responses in multi-turn dialogues while the context evolves over time. To address this issue, we propose D-SMART, a model-agnostic framework designed to maintain multi-turn dialogue consistency by enabling LLMs to build and reason over a dynamic, structured representation of the conversational context. This is achieved via two synergistic components: (1) a Dynamic Structured Memory (DSM), which incrementally constructs and maintains an authoritative, OWL-compliant knowledge graph of the conversation; and (2) a Reasoning Tree (RT), which executes inferences as an explicit and traceable multi-step search over the graph. As the popular-used quality score (judged by GPT-4) can overlook logical flaws, we introduce new NLI-based metrics to better measure multi-turn dialogue consistency. Comprehensive experiments on the MT-Bench-101 benchmark show that D-SMART significantly outperforms state-of-the-art baselines, elevating the dialogue consistency score by over 48\% for both proprietary and open-source models, and notably improves the quality score of the latter by up to 10.1\%.
Related papers
- KnowMT-Bench: Benchmarking Knowledge-Intensive Long-Form Question Answering in Multi-Turn Dialogues [58.305425399644086]
Multi-Turn Long-Form Question Answering (MT-LFQA) is a key application paradigm of Large Language Models (LLMs) in knowledge-intensive domains.<n>We introduce textbfKnowMT-Bench, the textitfirst-ever benchmark designed to systematically evaluate MT-LFQA for LLMs across knowledge-intensive fields.
arXiv Detail & Related papers (2025-09-26T04:32:29Z) - Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context [0.0]
We propose a hybrid agentic memory architecture that enriches vector-based storage with explicit linguistic cues to improve recall of nuanced, context-rich exchanges.<n>Experiments on adapted long-term dialogue datasets show that semantic anchoring improves factual recall and discourse coherence by up to 18% over strong RAG baselines.
arXiv Detail & Related papers (2025-08-18T05:14:48Z) - DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs [54.4857963044859]
We propose DialogueReason, a reasoning paradigm that uncovers the lost roles in monologue-style reasoning models.<n>Our work consists of an analysis of monologue reasoning patterns and the development of a dialogue-based reasoning approach.
arXiv Detail & Related papers (2025-05-11T16:39:58Z) - In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents [70.12342024019044]
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information limits their effectiveness.<n>We propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections.<n>RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
arXiv Detail & Related papers (2025-03-11T04:15:52Z) - Advancing Multi-Party Dialogue Framework with Speaker-ware Contrastive Learning [10.678477576849579]
We propose Contrastive learning-based Multi-party dialogue Response generation framework.<n>CMR employs a two-stage self-supervised contrastive learning framework.<n> Experimental results demonstrate that CMR not only significantly outperforms state-of-the-art models, but also generalizes well to large pre-trained language models.
arXiv Detail & Related papers (2025-01-20T06:28:22Z) - MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues [58.33076950775072]
MT-Bench-101 is designed to evaluate the fine-grained abilities of Large Language Models (LLMs) in multi-turn dialogues.
We construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.
We then evaluate 21 popular LLMs based on MT-Bench-101, conducting comprehensive analyses from both ability and task perspectives.
arXiv Detail & Related papers (2024-02-22T18:21:59Z) - UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems [43.266153244137215]
Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks.
We decompose the use of multiple sources in generating personalized response into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation.
We propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG)
arXiv Detail & Related papers (2024-01-24T06:50:20Z) - InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models [9.611864685207056]
We propose a novel approach, InstructERC, to reformulate the emotion recognition task from a discriminative framework to a generative framework based on Large Language Models (LLMs)
InstructERC makes three significant contributions: (1) it introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information; (2) we introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations; and (3) Pioneeringly, we unify emotion labels across benchmarks through the feeling wheel to fit real application scenarios.
arXiv Detail & Related papers (2023-09-21T09:22:07Z) - 'What are you referring to?' Evaluating the Ability of Multi-Modal
Dialogue Models to Process Clarificational Exchanges [65.03196674816772]
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee.
Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarification Exchanges (CE): a Clarification Request (CR) and a response.
Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models.
arXiv Detail & Related papers (2023-07-28T13:44:33Z) - Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2.
It acquires the context related attribute and relation knowledge from the knowledge base.
We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z) - DialogBERT: Discourse-Aware Response Generation via Learning to Recover
and Rank Utterances [18.199473005335093]
This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models.
To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression.
Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines.
arXiv Detail & Related papers (2020-12-03T09:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.