FullStop:Punctuation and Segmentation Prediction for Dutch with
Transformers
- URL: http://arxiv.org/abs/2301.03319v1
- Date: Mon, 9 Jan 2023 13:12:05 GMT
- Title: FullStop:Punctuation and Segmentation Prediction for Dutch with
Transformers
- Authors: Vincent Vandeghinste, Oliver Guhr
- Abstract summary: The model we present is an extension of the models of Guhr et al. (2021) for Dutch and is made publicly available.
For every word in the input sequence, the models predicts a punctuation marker that follows the word.
Results show to be much better than a machine translation baseline approach.
- Score: 1.2246649738388389
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When applying automated speech recognition (ASR) for Belgian Dutch (Van Dyck
et al. 2021), the output consists of an unsegmented stream of words, without
any punctuation. A next step is to perform segmentation and insert punctuation,
making the ASR output more readable and easy to manually correct. As far as we
know there is no publicly available punctuation insertion system for Dutch that
functions at a usable level. The model we present here is an extension of the
models of Guhr et al. (2021) for Dutch and is made publicly available. We
trained a sequence classification model, based on the Dutch language model
RobBERT (Delobelle et al. 2020). For every word in the input sequence, the
models predicts a punctuation marker that follows the word. We have also
extended a multilingual model, for cases where the language is unknown or where
code switching applies. When performing the task of segmentation, the
application of the best models onto out of domain test data, a sliding window
of 200 words of the ASR output stream is sent to the classifier, and
segmentation is applied when the system predicts a segmenting punctuation sign
with a ratio above threshold. Results show to be much better than a machine
translation baseline approach.
Related papers
- Generative Spoken Language Model based on continuous word-sized audio
tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings.
The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z) - ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource
Speech Translation Tasks [8.651248939672769]
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation.
We build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR.
Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models.
arXiv Detail & Related papers (2022-05-04T10:36:57Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Word Segmentation on Discovered Phone Units with Dynamic Programming and
Self-Supervised Scoring [23.822788597966646]
Recent work on unsupervised speech segmentation has used self-supervised models with a phone segmentation module and a word segmentation module that are trained jointly.
This paper compares this joint methodology with an older idea: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units.
I specifically describe a duration-penalized dynamic programming (DPDP) procedure that can be used for either phone or word segmentation by changing the self-supervised scoring network that gives segment costs.
arXiv Detail & Related papers (2022-02-24T07:02:56Z) - Dealing with training and test segmentation mismatch: FBK@IWSLT2021 [13.89298686257514]
This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task.
It is a Transformer-based architecture trained to translate English speech audio data into German texts.
The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure.
arXiv Detail & Related papers (2021-06-23T18:11:32Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - Off-Line Arabic Handwritten Words Segmentation using Morphological
Operators [0.0]
The framework is proposed based on three steps: pre-processing, segmentation, and evaluation.
The proposed model achieved the highest accuracy when compared with the related works.
arXiv Detail & Related papers (2021-01-07T23:38:53Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - The Sequence-to-Sequence Baseline for the Voice Conversion Challenge
2020: Cascading ASR and TTS [66.06385966689965]
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020.
We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic speech recognition (ASR) model.
We revisit this method under a sequence-to-sequence (seq2seq) framework by utilizing ESPnet, an open-source end-to-end speech processing toolkit.
arXiv Detail & Related papers (2020-10-06T02:27:38Z) - Automatic Machine Translation Evaluation in Many Languages via Zero-Shot
Paraphrasing [11.564158965143418]
We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser.
We propose training the paraphraser as a multilingual NMT system, treating paraphrasing as a zero-shot translation task.
Our method is simple and intuitive, and does not require human judgements for training.
arXiv Detail & Related papers (2020-04-30T03:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.