Automatic Speech Summarisation: A Scoping Review
- URL: http://arxiv.org/abs/2008.11897v1
- Date: Thu, 27 Aug 2020 03:15:40 GMT
- Title: Automatic Speech Summarisation: A Scoping Review
- Authors: Dana Rezazadegan, Shlomo Berkovsky, Juan C. Quiroz, A. Baki Kocaballi,
Ying Wang, Liliana Laranjo, Enrico Coiera
- Abstract summary: This scoping review maps the speech summarisation literature with no restrictions on time frame, language summarised, research method, or paper type.
We reviewed a total of 110 papers out of a set of 153 found through a literature search and extracted speech features used, methods, scope, and training corpora.
- Score: 7.755991028607979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech summarisation techniques take human speech as input and then output an
abridged version as text or speech. Speech summarisation has applications in
many domains from information technology to health care, for example improving
speech archives or reducing clinical documentation burden. This scoping review
maps the speech summarisation literature, with no restrictions on time frame,
language summarised, research method, or paper type. We reviewed a total of 110
papers out of a set of 153 found through a literature search and extracted
speech features used, methods, scope, and training corpora. Most studies employ
one of four speech summarisation architectures: (1) Sentence extraction and
compaction; (2) Feature extraction and classification or rank-based sentence
selection; (3) Sentence compression and compression summarisation; and (4)
Language modelling. We also discuss the strengths and weaknesses of these
different methods and speech features. Overall, supervised methods (e.g. Hidden
Markov support vector machines, Ranking support vector machines, Conditional
random fields) performed better than unsupervised methods. As supervised
methods require manually annotated training data which can be costly, there was
more interest in unsupervised methods. Recent research into unsupervised
methods focusses on extending language modelling, for example by combining
Uni-gram modelling with deep neural networks. Protocol registration: The
protocol for this scoping review is registered at https://osf.io.
Related papers
- Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - Learning Speech Representation From Contrastive Token-Acoustic
Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space.
The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z) - SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
Based Speech-Text Pre-training [106.34112664893622]
We propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.
Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks.
arXiv Detail & Related papers (2022-10-07T17:57:45Z) - SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation.
Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities.
We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Unified Speech-Text Pre-training for Speech Translation and Recognition [113.31415771943162]
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition.
The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning.
It achieves between 1.7 and 2.3 BLEU improvement above the state of the art on the MuST-C speech translation dataset.
arXiv Detail & Related papers (2022-04-11T20:59:51Z) - Unsupervised Pattern Discovery from Thematic Speech Archives Based on
Multilingual Bottleneck Features [41.951988293049205]
We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences.
The proposed system is able to effectively extract topic-related words and phrases from the lecture recordings on MIT OpenCourseWare.
arXiv Detail & Related papers (2020-11-03T20:06:48Z) - Deep Neural Networks for Automatic Speech Processing: A Survey from
Large Corpora to Limited Data [1.2031796234206138]
Most state-of-the-art speech systems are using Deep Neural Networks (DNNs)
These systems require a large amount of data to be learned.
We position ourselves for the following speech processing tasks: Automatic Speech Recognition, speaker identification and emotion recognition.
arXiv Detail & Related papers (2020-03-09T16:26:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.