The Interspeech 2025 Speech Accessibility Project Challenge
- URL: http://arxiv.org/abs/2507.22047v1
- Date: Tue, 29 Jul 2025 17:50:59 GMT
- Title: The Interspeech 2025 Speech Accessibility Project Challenge
- Authors: Xiuwen Zheng, Bornali Phukon, Jonghwan Na, Ed Cutrell, Kyu Han, Mark Hasegawa-Johnson, Pan-Pan Jiang, Aadhrik Kuila, Colin Lea, Bob MacDonald, Gautam Mantena, Venkatesh Ravichandran, Leda Sari, Katrin Tomanek, Chang D. Yoo, Chris Zwilling,
- Abstract summary: 2025 Interspeech Speech Accessibility Project launched, utilizing over 400 hours of SAP data collected and transcribed from more than 500 individuals with diverse speech disabilities.<n>12 out of 22 valid teams outperformed the whisper-large-v2 baseline in terms of Word Error Rate and Semantic Score.<n>The top team achieved the lowest WER of 8.11%, and the highest SemScore of 88.44% at the same time, setting new benchmarks for future ASR systems in recognizing impaired speech.
- Score: 35.902086799949345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the last decade has witnessed significant advancements in Automatic Speech Recognition (ASR) systems, performance of these systems for individuals with speech disabilities remains inadequate, partly due to limited public training data. To bridge this gap, the 2025 Interspeech Speech Accessibility Project (SAP) Challenge was launched, utilizing over 400 hours of SAP data collected and transcribed from more than 500 individuals with diverse speech disabilities. Hosted on EvalAI and leveraging the remote evaluation pipeline, the SAP Challenge evaluates submissions based on Word Error Rate and Semantic Score. Consequently, 12 out of 22 valid teams outperformed the whisper-large-v2 baseline in terms of WER, while 17 teams surpassed the baseline on SemScore. Notably, the top team achieved the lowest WER of 8.11\%, and the highest SemScore of 88.44\% at the same time, setting new benchmarks for future ASR systems in recognizing impaired speech.
Related papers
- Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges [58.80034860169605]
The CHiME-7 and 8 distant speech recognition (DASR) challenges focus on multi-channel, generalizable, joint automatic speech recognition (ASR) and diarization of conversational speech.<n>This paper outlines the challenges' design, evaluation metrics, datasets, and baseline systems while analyzing key trends from participant submissions.
arXiv Detail & Related papers (2025-07-24T07:56:24Z) - The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition [95.95622220065884]
The MISP 2025 Challenge focuses on multi-modal, multi-device meeting transcription by incorporating video modality alongside audio.<n>The best-performing systems achieved significant improvements over the baseline.
arXiv Detail & Related papers (2025-05-20T06:11:51Z) - The First VoicePrivacy Attacker Challenge Evaluation Plan [39.256453635652484]
The First VoicePrivacy Attacker Challenge is a new kind of challenge organized as part of the VoicePrivacy initiative and supported by ICASSP 2025 as the SP Grand Challenge.
It focuses on developing attacker systems against voice anonymization, which will be evaluated against a set of anonymization systems submitted to the VoicePrivacy 2024 Challenge.
arXiv Detail & Related papers (2024-10-09T20:48:03Z) - ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge [94.13624830833314]
This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
arXiv Detail & Related papers (2024-01-07T12:51:42Z) - Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge [25.69349931845173]
The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants.
More than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set.
In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.
arXiv Detail & Related papers (2022-10-12T11:05:13Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR
Challenge [13.232899176888575]
This paper describes the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA.
All participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer.
To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models.
arXiv Detail & Related papers (2020-05-18T02:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.