Improving callsign recognition with air-surveillance data in air-traffic
communication
- URL: http://arxiv.org/abs/2108.12156v1
- Date: Fri, 27 Aug 2021 07:56:47 GMT
- Title: Improving callsign recognition with air-surveillance data in air-traffic
communication
- Authors: Iuliia Nigmatulina, Rudolf Braun, Juan Zuluaga-Gomez, Petr Motlicek
- Abstract summary: Speech recognition can be used as the assistance of speech communication between pilots and air-traffic controllers.
High accuracy predictions are needed to minimize the risk of errors.
Our results prove that the surveillance data containing callsigns can help to considerably improve the recognition of a callsign in an utterance.
- Score: 1.6058099298620423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR) can be used as the assistance of speech
communication between pilots and air-traffic controllers. Its application can
significantly reduce the complexity of the task and increase the reliability of
transmitted information. Evidently, high accuracy predictions are needed to
minimize the risk of errors. Especially, high accuracy is required in
recognition of key information, such as commands and callsigns, used to
navigate pilots. Our results prove that the surveillance data containing
callsigns can help to considerably improve the recognition of a callsign in an
utterance when the weights of probable callsign n-grams are reduced per
utterance. In this paper, we investigate two approaches: (1) G-boosting, when
callsigns weights are adjusted at language model level (G) and followed by the
dynamic decoder with an on-the-fly composition, and (2) lattice rescoring when
callsign information is introduced on top of lattices generated using a
conventional decoder. Boosting callsign n-grams with the combination of two
methods allowed us to gain 28.4% of absolute improvement in callsign
recognition accuracy and up to 74.2% of relative improvement in WER of callsign
recognition.
Related papers
- InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions [5.50485371072671]
Our method improves the recognition accuracy of misrecognized target keywords by substituting intermediate CTC predictions with corrected labels.
Experiments conducted in Japanese demonstrated that our method successfully improved the F1 score for unknown words.
arXiv Detail & Related papers (2024-06-21T06:25:10Z) - Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios.
GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used.
We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z) - On the Importance of Signer Overlap for Sign Language Detection [65.26091369630547]
We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well.
We quantify this with a detailed analysis of the effect of signer overlap on current sign detection benchmark data sets.
We propose new data set partitions that are free of overlap and allow for more realistic performance assessment.
arXiv Detail & Related papers (2023-03-19T22:15:05Z) - Evaluating Automatic Speech Recognition in an Incremental Setting [0.7734726150561086]
We systematically evaluate six speech recognizers using metrics including word error rate, latency, and the number of updates to already recognized words on English test data.
We find that, generally, local recognizers are faster and require fewer updates than cloud-based recognizers.
arXiv Detail & Related papers (2023-02-23T14:22:40Z) - Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - A two-step approach to leverage contextual data: speech recognition in
air-traffic communications [1.3229510087215552]
We prove that combining the benefits of ASR and NLP methods helps to considerably improve the recognition of callsigns.
Boosting callsign n-grams with the combination of ASR and NLP methods leads up to 53.7% of an absolute, or 60.4% of a relative, improvement in callsign recognition.
arXiv Detail & Related papers (2022-02-08T08:59:54Z) - Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.