Related papers: Deep Learning for Assessment of Oral Reading Fluency

Deep Learning for Assessment of Oral Reading Fluency

URL: http://arxiv.org/abs/2405.19426v2
Date: Sat, 1 Jun 2024 14:16:42 GMT
Title: Deep Learning for Assessment of Oral Reading Fluency
Authors: Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao,
Abstract summary: This work investigates end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. We report the performance of a number of system variations on the relevant measures, and probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.
Score: 5.707725771108279
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.

Related papers

An End-to-End Approach for Child Reading Assessment in the Xhosa Language [0.3579433677269426]
This study focuses on Xhosa, a language spoken in South Africa, to advance child speech recognition capabilities.<n>We present a novel dataset composed of child speech samples in Xhosa.<n>The results indicate that the performance of these models can be significantly influenced by the amount and balancing of the available training data.
arXiv Detail & Related papers (2025-05-23T00:59:58Z)
LLMs as Educational Analysts: Transforming Multimodal Data Traces into Actionable Reading Assessment Reports [6.523137821124204]
This study investigates the use of multimodal data sources to derive meaningful reading insights. We employ unsupervised learning techniques to identify distinct reading behavior patterns. A large language model (LLM) synthesizes the derived information into actionable reports for educators.
arXiv Detail & Related papers (2025-03-03T22:34:08Z)
Dementia Insights: A Context-Based MultiModal Approach [0.3749861135832073]
Early detection is crucial for timely interventions that may slow disease progression. Large pre-trained models (LPMs) for text and audio have shown promise in identifying cognitive impairments. This study proposes a context-based multimodal method, integrating both text and audio data using the best-performing LPMs.
arXiv Detail & Related papers (2025-03-03T06:46:26Z)
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment [65.70317151363204]
This work introduces the first framework for reconstructing surgical dialogue from unstructured real-world recordings. In surgical training, the formative verbal feedback that trainers provide to trainees during live surgeries is crucial for ensuring safety, correcting behavior immediately, and facilitating long-term skill acquisition. Our framework integrates voice activity detection, speaker diarization, and automated speech recaognition, with a novel enhancement that removes hallucinations.
arXiv Detail & Related papers (2024-12-01T10:35:12Z)
Multi-Modal Self-Supervised Learning for Surgical Feedback Effectiveness Assessment [66.6041949490137]
We propose a method that integrates information from transcribed verbal feedback and corresponding surgical video to predict feedback effectiveness. Our findings show that both transcribed feedback and surgical video are individually predictive of trainee behavior changes. Our results demonstrate the potential of multi-modal learning to advance the automated assessment of surgical feedback.
arXiv Detail & Related papers (2024-11-17T00:13:00Z)
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education [3.967610895056427]
This paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings.
arXiv Detail & Related papers (2024-04-03T04:15:29Z)
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation [22.38338205905379]
We leverage reinforcement learning algorithms to overcome the above challenges by introducing a novel reward function. Our reward function combines an accuracy metric and a faithfulness metric to provide a balanced quality judgment of generated responses.
arXiv Detail & Related papers (2023-11-02T02:42:41Z)
A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images [71.26717896083433]
Self-supervised pretraining has been observed to be effective at improving feature representations for transfer learning. This review summarizes recent research into its usage in X-ray, computed tomography, magnetic resonance, and ultrasound imaging.
arXiv Detail & Related papers (2023-09-05T19:45:09Z)
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring [13.817385516193445]
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. We introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring.
arXiv Detail & Related papers (2023-05-19T05:39:41Z)
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors [56.554277096170246]
We present an empirical study that evaluates and contrasts four commonly employed annotation methods in user studies focused on in-the-wild data collection. For both the user-driven, in situ annotations, where participants annotate their activities during the actual recording process, and the recall methods, where participants retrospectively annotate their data at the end of each day, the participants had the flexibility to select their own set of activity classes and corresponding labels.
arXiv Detail & Related papers (2023-05-15T16:02:56Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z)
Deep Learning for Prominence Detection in Children's Read Speech [13.041607703862724]
We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words. A previous well-tuned random forest ensemble predictor is replaced by an RNN sequence to exploit potential context dependency. Deep learning is applied to obtain word-level features from low-level acoustic contours of fundamental frequency, intensity and spectral shape.
arXiv Detail & Related papers (2021-04-12T14:15:08Z)
Confident Coreset for Active Learning in Medical Image Analysis [57.436224561482966]
We propose a novel active learning method, confident coreset, which considers both uncertainty and distribution for effectively selecting informative samples. By comparative experiments on two medical image analysis tasks, we show that our method outperforms other active learning methods.
arXiv Detail & Related papers (2020-04-05T13:46:16Z)
A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia [0.0]
Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. We propose a new methodology to assess the risk of DD before students learn to read.
arXiv Detail & Related papers (2020-02-06T10:08:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.