Codebook-Injected Dialogue Segmentation for Multi-Utterance Constructs Annotation: LLM-Assisted and Gold-Label-Free Evaluation
- URL: http://arxiv.org/abs/2601.12061v2
- Date: Thu, 22 Jan 2026 14:16:14 GMT
- Title: Codebook-Injected Dialogue Segmentation for Multi-Utterance Constructs Annotation: LLM-Assisted and Gold-Label-Free Evaluation
- Authors: Jinsook Lee, Kirk Vanacore, Zhuqian Zhou, Bakhtawar Ahtisham, Jeanine Grutter, Rene F. Kizilcec,
- Abstract summary: Dialogue Act (DA) annotation treats communicative or pedagogical intent as localized to individual utterances or turns.<n>We propose codebook-injected segmentation, which conditions boundary decisions on downstream annotation criteria.<n>We find DA-awareness produces segments that are internally more consistent than text-only baselines.
- Score: 0.17240671897505613
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dialogue Act (DA) annotation typically treats communicative or pedagogical intent as localized to individual utterances or turns. This leads annotators to agree on the underlying action while disagreeing on segment boundaries, reducing apparent reliability. We propose codebook-injected segmentation, which conditions boundary decisions on downstream annotation criteria, and evaluate LLM-based segmenters against standard and retrieval-augmented baselines. To assess these without gold labels, we introduce evaluation metrics for span consistency, distinctiveness, and human-AI distributional agreement. We found DA-awareness produces segments that are internally more consistent than text-only baselines. While LLMs excel at creating construct-consistent spans, coherence-based baselines remain superior at detecting global shifts in dialogue flow. Across two datasets, no single segmenter dominates. Improvements in within-segment coherence frequently trade off against boundary distinctiveness and human-AI distributional agreement. These results highlight segmentation as a consequential design choice that should be optimized for downstream objectives rather than a single performance score.
Related papers
- Unsupervised Text Segmentation via Kernel Change-Point Detection on Sentence Embeddings [0.0]
Unsupervised text segmentation is crucial because boundary labels are expensive, subjective, and often fail to transfer across domains and granularity choices.<n>We propose Embed-KCPD, a training-free method that represents sentences as embedding vectors and estimates boundaries by minimizing a penalized KCPD objective.<n>A case study on Taylor Swift's tweets illustrates that Embed-KCPD combines strong theoretical guarantees, simulated reliability, and practical effectiveness for text segmentation.
arXiv Detail & Related papers (2026-01-26T18:54:34Z) - AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering [97.52852990265136]
We introduce AQAScore, a backbone-agnostic evaluation framework that leverages the reasoning capabilities of audio-aware large language models.<n>We evaluate AQAScore across multiple benchmarks, including human-rated relevance, pairwise comparison, and compositional reasoning tasks.
arXiv Detail & Related papers (2026-01-21T07:35:36Z) - SegNSP: Revisiting Next Sentence Prediction for Linear Text Segmentation [5.476083747178194]
We introduce SegNSP, framing linear text segmentation as a next sentence prediction (NSP) task.<n>We propose a label-agnostic NSP approach, which predicts whether the next sentence continues the current topic without requiring explicit topic labels.<n>We evaluate our model against established baselines on two datasets, CitiLink-Minutes and WikiSection.
arXiv Detail & Related papers (2026-01-07T00:02:30Z) - Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech [61.00008468914252]
We recast paragraph segmentation as the missing structuring step and fill three gaps at the intersection of speech processing and text segmentation.<n> benchmarks focus on the underexplored speech domain, where paragraph segmentation has traditionally not been part of post-processing.<n>Second, we propose a constrained-decoding formulation that lets large language models insert paragraph breaks while preserving the original transcript.<n>Third, we show that a compact model (MiniSeg) attains state-of-the-art accuracy and, when extended hierarchically, jointly predicts chapters and paragraphs with minimal computational cost.
arXiv Detail & Related papers (2025-12-30T23:29:51Z) - When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation [0.0]
This paper introduces an evaluation framework that reports boundary density and segment alignment diagnostics (purity and coverage) alongside window-tolerant F1 (W-F1)<n>By separating boundary scoring from boundary selection, we evaluate segmentation quality across density regimes rather than at a single operating point.
arXiv Detail & Related papers (2025-12-18T21:29:43Z) - SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space [11.534994345027362]
Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation.<n>We introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance.<n>We introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder.
arXiv Detail & Related papers (2025-10-28T14:09:05Z) - LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [54.683384204063934]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - Panoptic Out-of-Distribution Segmentation [11.388678390784195]
We propose Panoptic Out-of Distribution for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction.
We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de.
arXiv Detail & Related papers (2023-10-18T08:38:31Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows [63.116280145770006]
We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
arXiv Detail & Related papers (2022-02-14T11:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.