Related papers: CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

URL: http://arxiv.org/abs/2512.21715v1
Date: Thu, 25 Dec 2025 15:33:25 GMT
Title: CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation
Authors: Rui Ke, Jiahui Xu, Shenghao Yang, Kuang Wang, Feng Jiang, Haizhou Li,
Abstract summary: We propose a unified framework that integrates three core components: context-aware topic representation, preference-guided topic clustering, and a hierarchical theme generation mechanism.<n>Experiments on a multi-domain customer dialogue benchmark (DSTC-12) demonstrate the effectiveness of CATCH with 8B LLM in both theme clustering and topic generation quality.
Score: 33.065240934374586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, which operates within fixed label spaces, theme detection requires cross-dialogue consistency and alignment with personalized user preferences, posing significant challenges. Existing methods often struggle with sparse, short utterances for accurate topic representation and fail to capture user-level thematic preferences across dialogues. To address these challenges, we propose CATCH (Controllable Theme Detection with Contextualized Clustering and Hierarchical Generation), a unified framework that integrates three core components: (1) context-aware topic representation, which enriches utterance-level semantics using surrounding topic segments; (2) preference-guided topic clustering, which jointly models semantic proximity and personalized feedback to align themes across dialogue; and (3) a hierarchical theme generation mechanism designed to suppress noise and produce robust, coherent topic labels. Experiments on a multi-domain customer dialogue benchmark (DSTC-12) demonstrate the effectiveness of CATCH with 8B LLM in both theme clustering and topic generation quality.

Related papers

Controllable Conversational Theme Detection Track at DSTC 12 [24.160077192565087]
We introduce Theme Detection as a critical task in conversational analytics.<n>Unlike traditional dialog intent detection, themes are intended as a direct, user-facing summary of the conversation's core inquiry.<n>We pose Controllable Conversational Theme Detection problem as a public competition track at Dialog System Technology Challenge 12.
arXiv Detail & Related papers (2025-08-26T08:10:01Z)
Unsupervised Mutual Learning of Discourse Parsing and Topic Segmentation in Dialogue [37.618612723025784]
In dialogue systems, discourse plays a crucial role in managing conversational focus and coordinating interactions.<n>It consists of two key structures: rhetorical structure and topic structure.<n>We introduce a unified representation that integrates rhetorical and topic structures, ensuring semantic consistency between them.<n>We propose an unsupervised mutual learning framework (UMLF) that jointly models rhetorical and topic structures, allowing them to mutually reinforce each other without requiring additional annotations.
arXiv Detail & Related papers (2024-05-30T08:10:50Z)
A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames [30.200413352223347]
We first propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS. The target semantic frame is organized in a 3-layer hierarchical structure to tackle the alignment and assignment problems in multi-intent cases. We devise a BiRGAT model to encode the hierarchy of items, the backbone of which is a dual relational graph attention network.
arXiv Detail & Related papers (2024-02-28T11:39:26Z)
Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective. We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z)
Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics. We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context. Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z)
Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks. We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z)
Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data. RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously. A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z)
Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.