A two-step approach to leverage contextual data: speech recognition in
air-traffic communications
- URL: http://arxiv.org/abs/2202.03725v1
- Date: Tue, 8 Feb 2022 08:59:54 GMT
- Title: A two-step approach to leverage contextual data: speech recognition in
air-traffic communications
- Authors: Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed
Sarfjoo, Petr Motlicek
- Abstract summary: We prove that combining the benefits of ASR and NLP methods helps to considerably improve the recognition of callsigns.
Boosting callsign n-grams with the combination of ASR and NLP methods leads up to 53.7% of an absolute, or 60.4% of a relative, improvement in callsign recognition.
- Score: 1.3229510087215552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR), as the assistance of speech communication
between pilots and air-traffic controllers, can significantly reduce the
complexity of the task and increase the reliability of transmitted information.
ASR application can lead to a lower number of incidents caused by
misunderstanding and improve air traffic management (ATM) efficiency.
Evidently, high accuracy predictions, especially, of key information, i.e.,
callsigns and commands, are required to minimize the risk of errors. We prove
that combining the benefits of ASR and Natural Language Processing (NLP)
methods to make use of surveillance data (i.e. additional modality) helps to
considerably improve the recognition of callsigns (named entity). In this
paper, we investigate a two-step callsign boosting approach: (1) at the 1 step
(ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the
decoding FST (lattices), (2) at the 2 step (NLP), callsigns extracted from the
improved recognition outputs with Named Entity Recognition (NER) are correlated
with the surveillance data to select the most suitable one. Boosting callsign
n-grams with the combination of ASR and NLP methods eventually leads up to
53.7% of an absolute, or 60.4% of a relative, improvement in callsign
recognition.
Related papers
- Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework.
To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies.
We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z) - Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement [53.044703127757295]
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset.
We propose a Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality.
The proposed DOTLA mechanism formulates a mutual reinforcement and efficient solution to cross-modality data association, which could effectively reduce the side-effects of some insufficient and noisy label associations.
arXiv Detail & Related papers (2023-05-22T04:40:30Z) - Streaming End-to-End Multilingual Speech Recognition with Joint Language
Identification [14.197869575012925]
We propose to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor.
RNN-T with cascaded encoders can achieve streaming ASR with low latency using first-pass decoding with no right-context, and achieve lower word error rates (WERs) using second-pass decoding with longer right-context.
Experimental results on a voice search dataset with 9 language locales shows that the proposed method achieves an average of 96.2% LID prediction accuracy and the same second-pass WER
arXiv Detail & Related papers (2022-09-13T15:10:41Z) - Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection
and Role Identification of Air-Traffic Communications [2.270534915073284]
Speech Activity Detection (SAD) or diarization system fails and then two or more single speaker segments are in the same recording.
We developed a system that combines the segmentation of a SAD module with a BERT-based model that performs Speaker Change Detection (SCD) and Speaker Role Identification (SRI) based on ASR transcripts (i.e., diarization + SRI)
The proposed model reaches up to 0.90/0.95 F1-score on ATCO/pilot for SRI on several test sets.
arXiv Detail & Related papers (2021-10-12T07:25:12Z) - Improving callsign recognition with air-surveillance data in air-traffic
communication [1.6058099298620423]
Speech recognition can be used as the assistance of speech communication between pilots and air-traffic controllers.
High accuracy predictions are needed to minimize the risk of errors.
Our results prove that the surveillance data containing callsigns can help to considerably improve the recognition of a callsign in an utterance.
arXiv Detail & Related papers (2021-08-27T07:56:47Z) - Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.