From Speech to Summary: A Comprehensive Survey of Speech Summarization
- URL: http://arxiv.org/abs/2504.08024v1
- Date: Thu, 10 Apr 2025 17:50:53 GMT
- Title: From Speech to Summary: A Comprehensive Survey of Speech Summarization
- Authors: Fabian Retkowski, Maike Züfle, Andreas Sudmann, Dinah Pfau, Jan Niehues, Alexander Waibel,
- Abstract summary: Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content.<n>Despite its increasing importance, speech summarization is still not clearly defined and intersects with several research areas, including speech recognition, text summarization, and specific applications like meeting summarization.
- Score: 52.97157554560492
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content. However, despite its increasing importance, speech summarization is still not clearly defined and intersects with several research areas, including speech recognition, text summarization, and specific applications like meeting summarization. This survey not only examines existing datasets and evaluation methodologies, which are crucial for assessing the effectiveness of summarization approaches but also synthesizes recent developments in the field, highlighting the shift from traditional systems to advanced models like fine-tuned cascaded architectures and end-to-end solutions.
Related papers
- Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective [12.178918299455898]
This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities.
Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.
arXiv Detail & Related papers (2024-09-29T06:30:46Z) - Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization [7.234196390284036]
This article summarizes the research on Transformer-based abstractive summarization for English dialogues.
We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality)
We find that while some challenges, like language, have seen considerable progress, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities.
arXiv Detail & Related papers (2024-06-11T17:30:22Z) - Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with
Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries.
Traditional summarization methods produce one summary mixing information of all aspects.
We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z) - Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.<n>We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.<n>We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - SummIt: Iterative Text Summarization via ChatGPT [12.966825834765814]
We propose SummIt, an iterative text summarization framework based on large language models like ChatGPT.
Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback.
We also conduct a human evaluation to validate the effectiveness of the iterative refinements and identify a potential issue of over-correction.
arXiv Detail & Related papers (2023-05-24T07:40:06Z) - A Focused Study on Sequence Length for Dialogue Summarization [68.73335643440957]
We analyze the length differences between existing models' outputs and the corresponding human references.
We identify salient features for summary length prediction by comparing different model settings.
Third, we experiment with a length-aware summarizer and show notable improvement on existing models if summary length can be well incorporated.
arXiv Detail & Related papers (2022-09-24T02:49:48Z) - Abstractive Meeting Summarization: A Survey [15.455647477995306]
A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts.
Recent advances in deep learning has significantly improved language generation systems, opening the door to improved forms of abstractive summarization.
We provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models and evaluation metrics that have been used to tackle the problems.
arXiv Detail & Related papers (2022-08-08T14:04:38Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.