Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems
- URL: http://arxiv.org/abs/2104.03643v1
- Date: Thu, 8 Apr 2021 09:53:54 GMT
- Title: Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems
- Authors: Juan Zuluaga-Gomez and Iuliia Nigmatulina and Amrutha Prasad and Petr
Motlicek and Karel Vesel\'y and Martin Kocour and Igor Sz\"oke
- Abstract summary: The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
- Score: 0.6465251961564605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Air traffic management and specifically air-traffic control (ATC) rely mostly
on voice communications between Air Traffic Controllers (ATCos) and pilots. In
most cases, these voice communications follow a well-defined grammar that could
be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign
used to address an airplane is an essential part of all ATCo-pilot
communications. We propose a two-steps approach to add contextual knowledge
during semi-supervised training to reduce the ASR system error rates at
recognizing the part of the utterance that contains the callsign. Initially, we
represent in a WFST the contextual knowledge (i.e. air-surveillance data) of an
ATCo-pilot communication. Then, during Semi-Supervised Learning (SSL) the
contextual knowledge is added by second-pass decoding (i.e. lattice
re-scoring). Results show that `unseen domains' (e.g. data from airports not
present in the supervised training data) are further aided by contextual SSL
when compared to standalone SSL. For this task, we introduce the Callsign Word
Error Rate (CA-WER) as an evaluation metric, which only assesses ASR
performance of the spoken callsign in an utterance. We obtained a 32.1% CA-WER
relative improvement applying SSL with an additional 17.5% CA-WER improvement
by adding contextual knowledge during SSL on a challenging ATC-based test set
gathered from LiveATC.
Related papers
- Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control [60.35553925189286]
We propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture.
We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets.
arXiv Detail & Related papers (2024-06-19T21:11:01Z) - Communication-Efficient Personalized Federated Learning for
Speech-to-Text Tasks [66.78640306687227]
To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems.
The commonly used FL approach (i.e., textscFedAvg) in S2T tasks typically suffers from extensive communication overhead.
We propose a personalized federated S2T framework that introduces textscFedLoRA, a lightweight LoRA module for client-side tuning and interaction with the server, and textscFedMem, a global model equipped with a $k$-near
arXiv Detail & Related papers (2024-01-18T15:39:38Z) - Lessons Learned in ATCO2: 5000 hours of Air Traffic Control
Communications for Robust Automatic Speech Recognition and Understanding [3.4713477325880464]
ATCO2 project aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time.
This paper reviews previous work from ATCO2 partners, including robust automatic speech recognition.
We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field.
arXiv Detail & Related papers (2023-05-02T02:04:33Z) - A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers [0.797970449705065]
We propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training.
The engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding.
To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools.
arXiv Detail & Related papers (2023-04-16T17:45:21Z) - ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech
Recognition and Natural Language Understanding of Air Traffic Control
Communications [51.24043482906732]
We introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging air traffic control (ATC) field.
The ATCO2 corpus is split into three subsets.
We expect the ATCO2 corpus will foster research on robust ASR and NLU.
arXiv Detail & Related papers (2022-11-08T07:26:45Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z) - BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection
and Role Identification of Air-Traffic Communications [2.270534915073284]
Speech Activity Detection (SAD) or diarization system fails and then two or more single speaker segments are in the same recording.
We developed a system that combines the segmentation of a SAD module with a BERT-based model that performs Speaker Change Detection (SCD) and Speaker Role Identification (SRI) based on ASR transcripts (i.e., diarization + SRI)
The proposed model reaches up to 0.90/0.95 F1-score on ATCO/pilot for SRI on several test sets.
arXiv Detail & Related papers (2021-10-12T07:25:12Z) - Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
and Language Model Pretraining [64.35907499990455]
We propose a framework to learn semantics directly from speech with semi-supervision from transcribed or untranscribed speech.
Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT.
In parallel, we identify two essential criteria for evaluating SLU models: environmental noise-robustness and E2E semantics evaluation.
arXiv Detail & Related papers (2020-10-26T18:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.