Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information
- URL: http://arxiv.org/abs/2204.06309v1
- Date: Wed, 13 Apr 2022 11:30:42 GMT
- Title: Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information
- Authors: Alexander Blatt, Martin Kocour, Karel Vesel\'y, Igor Sz\"oke, Dietrich
Klakow
- Abstract summary: Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
- Score: 72.20674534231314
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Air traffic control (ATC) relies on communication via speech between pilot
and air-traffic controller (ATCO). The call-sign, as unique identifier for each
flight, is used to address a specific pilot by the ATCO. Extracting the
call-sign from the communication is a challenge because of the noisy ATC voice
channel and the additional noise introduced by the receiver. A low
signal-to-noise ratio (SNR) in the speech leads to high word error rate (WER)
transcripts. We propose a new call-sign recognition and understanding (CRU)
system that addresses this issue. The recognizer is trained to identify
call-signs in noisy ATC transcripts and convert them into the standard
International Civil Aviation Organization (ICAO) format. By incorporating
surveillance information, we can multiply the call-sign accuracy (CSA) up to a
factor of four. The introduced data augmentation adds additional performance on
high WER transcripts and allows the adaptation of the model to unseen
airspaces.
Related papers
- Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control [60.35553925189286]
We propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture.
We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets.
arXiv Detail & Related papers (2024-06-19T21:11:01Z) - A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers [0.797970449705065]
We propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training.
The engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding.
To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools.
arXiv Detail & Related papers (2023-04-16T17:45:21Z) - Age of Information in Deep Learning-Driven Task-Oriented Communications [78.84264189471936]
This paper studies the notion of age in task-oriented communications that aims to execute a task at a receiver utilizing the data at its transmitter.
The transmitter-receiver operations are modeled as an encoder-decoder pair of deep neural networks (DNNs) that are jointly trained.
arXiv Detail & Related papers (2023-01-11T04:15:51Z) - Task-Oriented Communications for NextG: End-to-End Deep Learning and AI
Security Aspects [78.84264189471936]
NextG communication systems are beginning to explore shifting this design paradigm to reliably executing a given task such as in task-oriented communications.
Wireless signal classification is considered as the task for the NextG Radio Access Network (RAN), where edge devices collect wireless signals for spectrum awareness and communicate with the NextG base station (gNodeB) that needs to identify the signal label.
Task-oriented communications is considered by jointly training the transmitter, receiver and classifier functionalities as an encoder-decoder pair for the edge device and the gNodeB.
arXiv Detail & Related papers (2022-12-19T17:54:36Z) - ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech
Recognition and Natural Language Understanding of Air Traffic Control
Communications [51.24043482906732]
We introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging air traffic control (ATC) field.
The ATCO2 corpus is split into three subsets.
We expect the ATCO2 corpus will foster research on robust ASR and NLU.
arXiv Detail & Related papers (2022-11-08T07:26:45Z) - A two-step approach to leverage contextual data: speech recognition in
air-traffic communications [1.3229510087215552]
We prove that combining the benefits of ASR and NLP methods helps to considerably improve the recognition of callsigns.
Boosting callsign n-grams with the combination of ASR and NLP methods leads up to 53.7% of an absolute, or 60.4% of a relative, improvement in callsign recognition.
arXiv Detail & Related papers (2022-02-08T08:59:54Z) - CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
Recognition [91.33781557979819]
We introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers.
We provide detailed statistics of both the clean and the augmented versions of our dataset.
arXiv Detail & Related papers (2022-01-11T06:32:12Z) - BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection
and Role Identification of Air-Traffic Communications [2.270534915073284]
Speech Activity Detection (SAD) or diarization system fails and then two or more single speaker segments are in the same recording.
We developed a system that combines the segmentation of a SAD module with a BERT-based model that performs Speaker Change Detection (SCD) and Speaker Role Identification (SRI) based on ASR transcripts (i.e., diarization + SRI)
The proposed model reaches up to 0.90/0.95 F1-score on ATCO/pilot for SRI on several test sets.
arXiv Detail & Related papers (2021-10-12T07:25:12Z) - Improving callsign recognition with air-surveillance data in air-traffic
communication [1.6058099298620423]
Speech recognition can be used as the assistance of speech communication between pilots and air-traffic controllers.
High accuracy predictions are needed to minimize the risk of errors.
Our results prove that the surveillance data containing callsigns can help to considerably improve the recognition of a callsign in an utterance.
arXiv Detail & Related papers (2021-08-27T07:56:47Z) - Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.