Related papers: Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction

Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction

URL: http://arxiv.org/abs/2505.21137v1
Date: Tue, 27 May 2025 12:50:53 GMT
Title: Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction
Authors: Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales,
Abstract summary: This work introduces a pseudo-labelling process to address the challenge of limited labelled data.<n>We prompt an E2E Whisper-based SGEC model with fluent transcriptions, showing a slight improvement in SGEC performance.<n>Finally, we assess the impact of increasing model size, revealing that while pseudo-labelled data does not yield performance gain for a larger Whisper model, training with prompts proves beneficial.
Score: 33.116296120680296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spoken Grammatical Error Correction (SGEC) and Feedback (SGECF) are crucial for second language learners, teachers and test takers. Traditional SGEC systems rely on a cascaded pipeline consisting of an ASR, a module for disfluency detection (DD) and removal and one for GEC. With the rise of end-to-end (E2E) speech foundation models, we investigate their effectiveness in SGEC and feedback generation. This work introduces a pseudo-labelling process to address the challenge of limited labelled data, expanding the training data size from 77 hours to approximately 2500 hours, leading to improved performance. Additionally, we prompt an E2E Whisper-based SGEC model with fluent transcriptions, showing a slight improvement in SGEC performance, with more significant gains in feedback generation. Finally, we assess the impact of increasing model size, revealing that while pseudo-labelled data does not yield performance gain for a larger Whisper model, training with prompts proves beneficial.

Related papers

End-to-End Spoken Grammatical Error Correction [33.116296120680296]
Grammatical Error Correction (GEC) and feedback play a vital role in supporting second language (L2) learners, educators, and examiners.<n>While written GEC is well-established, spoken GEC (SGEC) poses additional challenges due to disfluencies, transcription errors, and the lack of structured input.<n>This work examines an End-to-End (E2E) framework for SGEC and feedback generation, highlighting challenges and possible solutions.
arXiv Detail & Related papers (2025-06-23T11:40:04Z)
Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models [40.69348434971122]
We propose FedARA, a novel Adaptive Rank Allocation framework for federated parameter-efficient fine-tuning of language models.<n>FedARA consistently outperforms baselines by an average of 6.95% to 8.49% across various datasets and models under heterogeneous data.<n>Experiments on various edge devices demonstrate substantial decreases in total training time and energy consumption by up to 48.90% and 46.95%, respectively.
arXiv Detail & Related papers (2025-01-24T11:19:07Z)
BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency [5.1205362176467055]
We focus on Variation Sets (VSs), sets of consecutive utterances expressing a similar intent with slightly different words and structures.<n>To assess the impact of VSs on training data efficiency, we augment CDS data with different proportions of artificial VSs and use these datasets to train an auto-regressive model, GPT-2.<n>We find that the best proportion of VSs depends on the evaluation benchmark: BLiMP and GLUE scores benefit from the presence of VSs, but EWOK scores do not.
arXiv Detail & Related papers (2024-11-14T16:57:46Z)
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification [19.893213508284813]
Self-supervised adaptive pre-training is proposed to adapt the pre-trained model to the target domain and languages of the downstream task. We show that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages.
arXiv Detail & Related papers (2023-12-12T14:58:08Z)
Grammatical Error Correction via Mixed-Grained Weighted Training [68.94921674855621]
Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts. MainGEC designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation.
arXiv Detail & Related papers (2023-11-23T08:34:37Z)
Towards End-to-End Spoken Grammatical Error Correction [33.116296120680296]
Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This paper introduces an alternative "end-to-end" approach to spoken GEC, exploiting a speech recognition foundation model, Whisper.
arXiv Detail & Related papers (2023-11-09T17:49:02Z)
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification. DASA generates diversified training samples in speaker embedding space with negligible extra computing cost. The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z)
UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction [3.1798318618973362]
This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Intimacy Analysis" We achieved second-best results in all 10 languages according to the official Pearson's correlation regression evaluation measure.
arXiv Detail & Related papers (2023-03-02T12:18:53Z)
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition [51.232523987916636]
Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. In this work, we extend PATE learning to work with dynamic patterns, namely speech, and perform one very first experimental study on ASR to avoid acoustic data leakage.
arXiv Detail & Related papers (2022-10-11T16:55:54Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.