Related papers: Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation

Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation

URL: http://arxiv.org/abs/2601.02128v1
Date: Mon, 05 Jan 2026 14:00:48 GMT
Title: Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation
Authors: Steffen Freisinger, Philipp Seeberger, Thomas Ranzenberger, Tobias Bocklet, Korbinian Riedhammer,
Abstract summary: We introduce a novel approach to hierarchical topic segmentation in transcripts, generating multi-level tables of contents.<n>We compare zero-shot prompting and LoRA fine-tuning on large language models, while also exploring the integration of high-level speech pause features.
Score: 16.692915208235764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical topic segmentation in transcripts, generating multi-level tables of contents that capture both topic and subtopic boundaries. We compare zero-shot prompting and LoRA fine-tuning on large language models, while also exploring the integration of high-level speech pause features. Evaluations on English meeting recordings and multilingual lecture transcripts (Portuguese, German) show significant improvements over established topic segmentation baselines. Additionally, we adapt a common evaluation measure for multi-level segmentation, taking into account all hierarchical levels within one metric.

Related papers

Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech [61.00008468914252]
We recast paragraph segmentation as the missing structuring step and fill three gaps at the intersection of speech processing and text segmentation.<n> benchmarks focus on the underexplored speech domain, where paragraph segmentation has traditionally not been part of post-processing.<n>Second, we propose a constrained-decoding formulation that lets large language models insert paragraph breaks while preserving the original transcript.<n>Third, we show that a compact model (MiniSeg) attains state-of-the-art accuracy and, when extended hierarchically, jointly predicts chapters and paragraphs with minimal computational cost.
arXiv Detail & Related papers (2025-12-30T23:29:51Z)
Dense Video Captioning using Graph-based Sentence Summarization [80.52481563888459]
We propose a graph-based partition-and-summarization framework for dense video captioning.<n>We focus on the summarization" stage, and propose a framework that effectively exploits the relationship between semantic words for summarization.
arXiv Detail & Related papers (2025-06-25T16:23:43Z)
Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings [2.615008111842321]
We introduce an end-to-end scheme for topic segmentation using semantic speech encoders. We propose a new benchmark for spoken news topic segmentation by utilizing a dataset featuring 1000 hours of publicly available recordings. Our results demonstrate that while the traditional pipeline approach achieves a state-of-the-art $P_k$ score of 0.2431 for English, our end-to-end model delivers a competitive $P_k$ score of 0.2564.
arXiv Detail & Related papers (2024-09-10T05:24:36Z)
TreeSeg: Hierarchical Topic Segmentation of Large Transcripts [0.0]
We present TreeSeg, an approach that combines off-the-shelf embedding models with divisive clustering, to generate hierarchical, structured segmentations of transcripts in the form of binary trees. We evaluate TreeSeg on the ICSI and AMI corpora, demonstrating that it outperforms all baselines. Finally, we introduce TinyRec, a small-scale corpus of manually annotated transcripts, obtained from self-recorded video sessions.
arXiv Detail & Related papers (2024-06-28T23:49:26Z)
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z)
Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective. We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z)
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark [44.06803331843307]
paragraph-level topic structure can grasp and understand the overall context of a document from a higher level. The lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained research and applications. We propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction. We employ a two-stage man-machine collaborative annotation method to construct the largest Chinese paragraph-level Topic Structure corpus.
arXiv Detail & Related papers (2023-05-24T06:43:23Z)
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations [2.535399238341164]
End-to-end Speech Translation is hindered by a lack of available data resources. We propose a new data augmentation strategy, SegAugment, to address this issue. We show that the proposed method can also successfully augment sentence-level datasets.
arXiv Detail & Related papers (2022-12-19T18:29:31Z)
Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts [0.0]
We build a novel dataset of complete transcriptions of over 400 podcast episodes. These introductions contain information about the episodes' topics, hosts, and guests. We train three Transformer models based on the pre-trained BERT and different augmentation strategies.
arXiv Detail & Related papers (2021-10-14T00:34:51Z)
Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task. By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information. Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z)
Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.