Related papers: Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech

Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech

URL: http://arxiv.org/abs/2512.24517v1
Date: Tue, 30 Dec 2025 23:29:51 GMT
Title: Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech
Authors: Fabian Retkowski, Alexander Waibel,
Abstract summary: We recast paragraph segmentation as the missing structuring step and fill three gaps at the intersection of speech processing and text segmentation.<n> benchmarks focus on the underexplored speech domain, where paragraph segmentation has traditionally not been part of post-processing.<n>Second, we propose a constrained-decoding formulation that lets large language models insert paragraph breaks while preserving the original transcript.<n>Third, we show that a compact model (MiniSeg) attains state-of-the-art accuracy and, when extended hierarchically, jointly predicts chapters and paragraphs with minimal computational cost.
Score: 61.00008468914252
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Automatic speech transcripts are often delivered as unstructured word streams that impede readability and repurposing. We recast paragraph segmentation as the missing structuring step and fill three gaps at the intersection of speech processing and text segmentation. First, we establish TEDPara (human-annotated TED talks) and YTSegPara (YouTube videos with synthetic labels) as the first benchmarks for the paragraph segmentation task. The benchmarks focus on the underexplored speech domain, where paragraph segmentation has traditionally not been part of post-processing, while also contributing to the wider text segmentation field, which still lacks robust and naturalistic benchmarks. Second, we propose a constrained-decoding formulation that lets large language models insert paragraph breaks while preserving the original transcript, enabling faithful, sentence-aligned evaluation. Third, we show that a compact model (MiniSeg) attains state-of-the-art accuracy and, when extended hierarchically, jointly predicts chapters and paragraphs with minimal computational cost. Together, our resources and methods establish paragraph segmentation as a standardized, practical task in speech processing.

Related papers

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query. We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask. We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z)
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z)
Long-Form End-to-End Speech Translation via Latent Alignment Segmentation [6.153530338207679]
Current simultaneous speech translation models can process audio only up to a few seconds long. We propose a novel segmentation approach for a low-latency end-to-end speech translation. We show that the proposed approach achieves state-of-the-art quality at no additional computational cost.
arXiv Detail & Related papers (2023-09-20T15:10:12Z)
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation [65.6736056006381]
We present a multilingual punctuation-agnostic sentence segmentation method covering 85 languages. Our method outperforms all the prior best sentence-segmentation tools by an average of 6.1% F1 points. By using our method to match sentence segmentation to the segmentation used during training of MT models, we achieve an average improvement of 2.3 BLEU points.
arXiv Detail & Related papers (2023-05-30T09:49:42Z)
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark [44.06803331843307]
paragraph-level topic structure can grasp and understand the overall context of a document from a higher level. The lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained research and applications. We propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction. We employ a two-stage man-machine collaborative annotation method to construct the largest Chinese paragraph-level Topic Structure corpus.
arXiv Detail & Related papers (2023-05-24T06:43:23Z)
Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning. We propose a new approach called context-aware fine-tuning. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z)
Toward Unifying Text Segmentation and Long Document Summarization [31.084738269628748]
We study the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability.
arXiv Detail & Related papers (2022-10-28T22:07:10Z)
Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task [16.155438404910043]
We propose a single encoder-decoder neural network that can handle long documents and conversations. We successfully show a way to solve the combined task as a pure generation task. Our results establish a strong case for considering text segmentation and segment labeling as a whole.
arXiv Detail & Related papers (2022-09-28T01:08:50Z)
Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process. Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z)
Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task. By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information. Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.