Related papers: Mitigating Data Imbalance in Automated Speaking Assessment

Mitigating Data Imbalance in Automated Speaking Assessment

URL: http://arxiv.org/abs/2509.03010v1
Date: Wed, 03 Sep 2025 04:38:13 GMT
Title: Mitigating Data Imbalance in Automated Speaking Assessment
Authors: Fong-Chun Tsai, Kuan-Tang Huang, Bi-Cheng Yan, Tien-Hong Lo, Berlin Chen,
Abstract summary: We introduce a novel objective for training ASA models, dubbed the Balancing Logit Variation (BLV) loss.<n>We show that integrating the BLV loss into a celebrated text-based (BERT) model significantly enhances classification accuracy and fairness.
Score: 15.293800869580151
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated Speaking Assessment (ASA) plays a crucial role in evaluating second-language (L2) learners proficiency. However, ASA models often suffer from class imbalance, leading to biased predictions. To address this, we introduce a novel objective for training ASA models, dubbed the Balancing Logit Variation (BLV) loss, which perturbs model predictions to improve feature representation for minority classes without modifying the dataset. Evaluations on the ICNALE benchmark dataset show that integrating the BLV loss into a celebrated text-based (BERT) model significantly enhances classification accuracy and fairness, making automated speech evaluation more robust for diverse learners.

Related papers

Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels [6.254549196597175]
We propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels.<n>During training, we employ LLM-generated predictions on unlabeled data by using grammar competency-based prompts.<n>We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy.
arXiv Detail & Related papers (2025-11-17T09:00:26Z)
Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models. This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization. An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z)
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution [5.1660803395535835]
Self-supervised learning (SSL) has shown stellar performance compared to traditional methods.<n>However, SSL-based ASA systems are faced with at least three data-related challenges.<n>These challenges include limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels.
arXiv Detail & Related papers (2024-04-11T09:06:49Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions. We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM) We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
ASR in German: A Detailed Error Analysis [0.0]
This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data.
arXiv Detail & Related papers (2022-04-12T08:25:01Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data. The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z)
Unsupervised neural adaptation model based on optimal transport for spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.