Related papers: Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR

Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR

URL: http://arxiv.org/abs/2108.12175v1
Date: Fri, 27 Aug 2021 08:40:08 GMT
Title: Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR
Authors: Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Oliver Ohneiser, Hartmut Helmke, Saeed Sarfjoo, Iuliia Nigmatulina
Abstract summary: Assistant Based Speech Recognition (ABSR) for air traffic control is generally trained by pooling both Air Traffic Controller (ATCO) and pilot data. Due to data imbalance of ATCO and pilot and varying acoustic conditions, the ASR performance is usually significantly better for ATCOs than pilots.
Score: 1.1391158217994781
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Assistant Based Speech Recognition (ABSR) for air traffic control is generally trained by pooling both Air Traffic Controller (ATCO) and pilot data. In practice, this is motivated by the fact that the proportion of pilot data is lesser compared to ATCO while their standard language of communication is similar. However, due to data imbalance of ATCO and pilot and their varying acoustic conditions, the ASR performance is usually significantly better for ATCOs than pilots. In this paper, we propose to (1) split the ATCO and pilot data using an automatic approach exploiting ASR transcripts, and (2) consider ATCO and pilot ASR as two separate tasks for Acoustic Model (AM) training. For speaker role classification of ATCO and pilot data, a hypothesized ASR transcript is generated with a seed model, subsequently used to classify the speaker role based on the knowledge extracted from grammar defined by International Civil Aviation Organization (ICAO). This approach provides an average speaker role identification accuracy of 83% for ATCO and pilot. Finally, we show that training AMs separately for each task, or using a multitask approach is well suited for this data compared to AM trained by pooling all data.

Related papers

Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control [60.35553925189286]
We propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets.
arXiv Detail & Related papers (2024-06-19T21:11:01Z)
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications [51.24043482906732]
We introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging air traffic control (ATC) field. The ATCO2 corpus is split into three subsets. We expect the ATCO2 corpus will foster research on robust ASR and NLU.
arXiv Detail & Related papers (2022-11-08T07:26:45Z)
Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO) The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO. We propose a new call-sign recognition and understanding (CRU) system that addresses this issue. The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z)
Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS) ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z)
A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches [9.565067058593316]
We formulate the speaker role identification (SRI) task of controller-pilot communication as a binary classification problem. To ablate the impacts of the comparative approaches, various advanced neural network architectures are applied. The proposed MMSRINet shows the competitive performance and robustness than the other methods on both seen and unseen data.
arXiv Detail & Related papers (2021-11-03T07:00:20Z)
BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection and Role Identification of Air-Traffic Communications [2.270534915073284]
Speech Activity Detection (SAD) or diarization system fails and then two or more single speaker segments are in the same recording. We developed a system that combines the segmentation of a SAD module with a BERT-based model that performs Speaker Change Detection (SCD) and Speaker Role Identification (SRI) based on ASR transcripts (i.e., diarization + SRI) The proposed model reaches up to 0.90/0.95 F1-score on ATCO/pilot for SRI on several test sets.
arXiv Detail & Related papers (2021-10-12T07:25:12Z)
Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z)
Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR) APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.