Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models
- URL: http://arxiv.org/abs/2306.04980v1
- Date: Thu, 8 Jun 2023 07:10:39 GMT
- Title: Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models
- Authors: Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan
Tien
- Abstract summary: This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
- Score: 7.782346535009883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces approaches to assessing phrase breaks in ESL learners'
speech using pre-trained language models (PLMs) and large language models
(LLMs). There are two tasks: overall assessment of phrase break for a speech
clip and fine-grained assessment of every possible phrase break position. To
leverage NLP models, speech input is first force-aligned with texts, and then
pre-processed into a token sequence, including words and phrase break
information. To utilize PLMs, we propose a pre-training and fine-tuning
pipeline with the processed tokens. This process includes pre-training with a
replaced break token detection module and fine-tuning with text classification
and sequence labeling. To employ LLMs, we design prompts for ChatGPT. The
experiments show that with the PLMs, the dependence on labeled training data
has been greatly reduced, and the performance has improved. Meanwhile, we
verify that ChatGPT, a renowned LLM, has potential for further advancement in
this area.
Related papers
- Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - LAST: Language Model Aware Speech Tokenization [24.185165710384997]
We propose a novel approach to training a speech tokenizer by leveraging objectives from pre-trained textual LMs.
Our aim is to transform features from a pre-trained speech model into a new feature space that enables better clustering for speech LMs.
arXiv Detail & Related papers (2024-09-05T16:57:39Z) - SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z) - BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing [35.31866559807704]
modality alignment between speech and text remains an open problem.
We propose the BLSP approach that bootstraps Language-Speech Pre-training via behavior alignment of continuation writing.
We demonstrate that this straightforward process can extend the capabilities of LLMs to speech, enabling speech recognition, speech translation, spoken language understanding, and speech conversation, even in zero-shot cross-lingual scenarios.
arXiv Detail & Related papers (2023-09-02T11:46:05Z) - Assessing Phrase Break of ESL speech with Pre-trained Language Models [6.635783609515407]
This work introduces an approach to assessing phrase break in ESL learners' speech with pre-trained language models (PLMs)
Different with traditional methods, this proposal converts speech to token sequences, and then leverages the power of PLMs.
arXiv Detail & Related papers (2022-10-28T10:06:06Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
Joint Pre-Training [33.02912456062474]
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST2 speech translation.
arXiv Detail & Related papers (2021-10-20T00:59:36Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Warped Language Models for Noise Robust Language Understanding [11.017026606760728]
Masked Language Models (MLM) are self-supervised neural networks trained fill in the blanks in a given sentence with masked tokens.
We show that natural language understanding systems built on top of WLMs perform better compared to those built on conversationals.
arXiv Detail & Related papers (2020-11-03T18:26:28Z) - Byte Pair Encoding is Suboptimal for Language Model Pretraining [49.30780227162387]
We analyze differences between unigram LM tokenization and byte-pair encoding (BPE)
We find that the unigram LM tokenization method matches or outperforms BPE across downstream tasks and two languages.
We hope that developers of future pretrained LMs will consider adopting the unigram LM method over the more prevalent BPE.
arXiv Detail & Related papers (2020-04-07T21:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.