Towards End-to-end Speech-to-text Summarization
- URL: http://arxiv.org/abs/2306.05432v1
- Date: Tue, 6 Jun 2023 15:22:16 GMT
- Title: Towards End-to-end Speech-to-text Summarization
- Authors: Raul Monteiro and Diogo Pernes
- Abstract summary: Speech-to-text (S2T) summarization is a time-saving technique for filtering and keeping up with the broadcast news uploaded online on a daily basis.
End-to-end (E2E) modelling of S2T abstractive summarization is a promising approach that offers the possibility of generating rich latent representations.
We model S2T summarization both with a cascade and an E2E system for a corpus of broadcast news in French.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-to-text (S2T) summarization is a time-saving technique for filtering
and keeping up with the broadcast news uploaded online on a daily basis. The
rise of large language models from deep learning with impressive text
generation capabilities has placed the research focus on summarization systems
that produce paraphrased compact versions of the document content, also known
as abstractive summaries. End-to-end (E2E) modelling of S2T abstractive
summarization is a promising approach that offers the possibility of generating
rich latent representations that leverage non-verbal and acoustic information,
as opposed to the use of only linguistic information from automatically
generated transcripts in cascade systems. However, the few literature on E2E
modelling of this task fails on exploring different domains, namely broadcast
news, which is challenging domain where large and diversified volumes of data
are presented to the user every day. We model S2T summarization both with a
cascade and an E2E system for a corpus of broadcast news in French. Our novel
E2E model leverages external data by resorting to transfer learning from a
pre-trained T2T summarizer. Experiments show that both our cascade and E2E
abstractive summarizers are stronger than an extractive baseline. However, the
performance of the E2E model still lies behind the cascade one, which is object
of an extensive analysis that includes future directions to close that gap.
Related papers
- Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation [44.332577357986324]
Sen-SSum generates text summaries from a spoken document in a sentence-by-sentence manner.
We present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum.
arXiv Detail & Related papers (2024-08-01T00:18:21Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Leveraging Large Text Corpora for End-to-End Speech Summarization [58.673480990374635]
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech.
We present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.
arXiv Detail & Related papers (2023-03-02T05:19:49Z) - Textless Direct Speech-to-Speech Translation with Discrete Speech
Representation [27.182170555234226]
We propose a novel model, Textless Translatotron, for training an end-to-end direct S2ST model without any textual supervision.
When a speech encoder pre-trained with unsupervised speech data is used for both models, the proposed model obtains translation quality nearly on-par with Translatotron 2.
arXiv Detail & Related papers (2022-10-31T19:48:38Z) - TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [61.564874831498145]
TranSpeech is a speech-to-speech translation model with bilateral perturbation.
We establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices.
TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique.
arXiv Detail & Related papers (2022-05-25T06:34:14Z) - Topic-Guided Abstractive Text Summarization: a Joint Learning Approach [19.623946402970933]
We introduce a new approach for abstractive text summarization, Topic-Guided Abstractive Summarization.
The idea is to incorporate neural topic modeling with a Transformer-based sequence-to-sequence (seq2seq) model in a joint learning framework.
arXiv Detail & Related papers (2020-10-20T14:45:25Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.