Political corpus creation through automatic speech recognition on EU
debates
- URL: http://arxiv.org/abs/2304.08137v1
- Date: Mon, 17 Apr 2023 10:41:59 GMT
- Title: Political corpus creation through automatic speech recognition on EU
debates
- Authors: Hugo de Vos and Suzan Verberne
- Abstract summary: We present a transcribed corpus of the LIBE committee of the EU parliament, totalling 3.6 Million running words.
The meetings of parliamentary committees of the EU are a potentially valuable source of information for political scientists but the data is not readily available because only disclosed as speech recordings together with limited metadata.
We investigated the most appropriate Automatic Speech Recognition (ASR) model to create an accurate text transcription of the audio recordings of the meetings in order to make their content available for research and analysis.
- Score: 4.670305538969914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a transcribed corpus of the LIBE committee of the
EU parliament, totalling 3.6 Million running words. The meetings of
parliamentary committees of the EU are a potentially valuable source of
information for political scientists but the data is not readily available
because only disclosed as speech recordings together with limited metadata. The
meetings are in English, partly spoken by non-native speakers, and partly
spoken by interpreters. We investigated the most appropriate Automatic Speech
Recognition (ASR) model to create an accurate text transcription of the audio
recordings of the meetings in order to make their content available for
research and analysis. We focused on the unsupervised domain adaptation of the
ASR pipeline. Building on the transformer-based Wav2vec2.0 model, we
experimented with multiple acoustic models, language models and the addition of
domain-specific terms. We found that a domain-specific acoustic model and a
domain-specific language model give substantial improvements to the ASR output,
reducing the word error rate (WER) from 28.22 to 17.95. The use of
domain-specific terms in the decoding stage did not have a positive effect on
the quality of the ASR in terms of WER. Initial topic modelling results
indicated that the corpus is useful for downstream analysis tasks. We release
the resulting corpus and our analysis pipeline for future research.
Related papers
- Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings [2.615008111842321]
We introduce an end-to-end scheme for topic segmentation using semantic speech encoders.
We propose a new benchmark for spoken news topic segmentation by utilizing a dataset featuring 1000 hours of publicly available recordings.
Our results demonstrate that while the traditional pipeline approach achieves a state-of-the-art $P_k$ score of 0.2431 for English, our end-to-end model delivers a competitive $P_k$ score of 0.2564.
arXiv Detail & Related papers (2024-09-10T05:24:36Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - Cross-lingual Text-To-Speech with Flow-based Voice Conversion for
Improved Pronunciation [11.336431583289382]
This paper presents a method for end-to-end cross-lingual text-to-speech.
It aims to preserve the target language's pronunciation regardless of the original speaker's language.
arXiv Detail & Related papers (2022-10-31T12:44:53Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - On Prosody Modeling for ASR+TTS based Voice Conversion [82.65378387724641]
In voice conversion, an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents.
Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity.
We propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP)
arXiv Detail & Related papers (2021-07-20T13:30:23Z) - Seed Words Based Data Selection for Language Model Adaptation [11.59717828860318]
We present an approach for automatically selecting sentences, from a text corpus, that match, both semantically and morphologically, a glossary of terms furnished by the user.
The vocabulary of the baseline model is expanded and tailored, reducing the resulting OOV rate.
Results using different metrics (OOV rate, WER, precision and recall) show the effectiveness of the proposed techniques.
arXiv Detail & Related papers (2021-07-20T12:08:27Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z) - Generative Spoken Language Modeling from Raw Audio [42.153136032037175]
Generative spoken language modeling involves learning jointly the acoustic and linguistic characteristics of a language from raw audio only (without text or labels)
We introduce metrics to automatically evaluate the generated output in terms of acoustic and linguistic quality in two associated end-to-end tasks.
We test baseline systems consisting of a discrete speech encoder (returning discrete, low, pseudo-text units), a generative language model (trained on pseudo-text units) and a speech decoder.
arXiv Detail & Related papers (2021-02-01T21:41:40Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z) - Contextualized Spoken Word Representations from Convolutional
Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words.
The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.