Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning
- URL: http://arxiv.org/abs/2406.18313v2
- Date: Fri, 28 Jun 2024 11:04:19 GMT
- Title: Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning
- Authors: Yuanxi Lin, Tonglin Zhou, Yang Xiao,
- Abstract summary: This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources.
We create a dataset of standardized airport tower commands, including routine and emergency instructions.
We enhance broadcasted residual learning with squeeze-and-excitation and time-frame frequency-wise squeeze-and-excitation techniques, resulting in our BC-SENet model.
- Score: 3.4540938725122285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine and emergency instructions. We enhance broadcasted residual learning with squeeze-and-excitation and time-frame frequency-wise squeeze-and-excitation techniques, resulting in our BC-SENet model. This model focuses on crucial information with fewer parameters. Our tests on five keyword spotting models, including BC-SENet, demonstrate superior accuracy and efficiency. These findings highlight the effectiveness of our model advancements in improving speech command recognition for aviation safety and efficiency in noisy, high-stakes environments. Additionally, BC-SENet shows comparable performance on the common Google Speech Command dataset.
Related papers
- Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers [55.93119122318983]
Air traffic control (ATC) demands high pressure control with consequences of an error.<n> Detecting stress is key point in maintaining high high safety standards of ATC.<n>Anonymizing ATC voice data is one way to comply with privacy restrictions.
arXiv Detail & Related papers (2025-07-10T11:48:29Z) - Explainable LiDAR 3D Point Cloud Segmentation and Clustering for Detecting Airplane-Generated Wind Turbulence [8.653321928148545]
We present an advanced, explainable machine learning method that utilizes Light Detection and Ranging (LiDAR) data for effective wake vortex detection.
A novel feature of our research is the use of a perturbation-based explanation technique, which clarifies the model's decision-making processes.
This combination of semantic segmentation and clustering for real-time wake vortex tracking significantly advances aviation safety measures.
arXiv Detail & Related papers (2025-03-01T14:51:31Z) - Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications [0.0]
This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents.
Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech.
The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.
arXiv Detail & Related papers (2025-02-27T17:35:59Z) - Multi-stage Learning for Radar Pulse Activity Segmentation [51.781832424705094]
Radio signal recognition is a crucial function in electronic warfare.
Precise identification and localisation of radar pulse activities are required by electronic warfare systems.
Deep learning-based radar pulse activity recognition methods have remained largely underexplored.
arXiv Detail & Related papers (2023-12-15T01:56:27Z) - VBSF-TLD: Validation-Based Approach for Soft Computing-Inspired Transfer
Learning in Drone Detection [0.0]
This paper presents a transfer-based drone detection scheme, which forms an integral part of a computer vision-based module.
By harnessing the knowledge of pre-trained models from a related domain, transfer learning enables improved results even with limited training data.
Notably, the scheme's effectiveness is highlighted by its IOU-based validation results.
arXiv Detail & Related papers (2023-06-11T22:30:23Z) - Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z) - Improving performance of aircraft detection in satellite imagery while
limiting the labelling effort: Hybrid active learning [0.9379652654427957]
In the defense domain, aircraft detection on satellite imagery is a valuable tool for analysts.
We propose a hybrid clustering active learning method to select the most relevant data to label.
We show that this method can provide better or competitive results compared to other active learning methods.
arXiv Detail & Related papers (2022-02-10T08:24:07Z) - CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
Recognition [91.33781557979819]
We introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers.
We provide detailed statistics of both the clean and the augmented versions of our dataset.
arXiv Detail & Related papers (2022-01-11T06:32:12Z) - Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today.
We propose a novel deep learning algorithm to automate this aerial person detection (APD) task.
We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z) - BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection
and Role Identification of Air-Traffic Communications [2.270534915073284]
Speech Activity Detection (SAD) or diarization system fails and then two or more single speaker segments are in the same recording.
We developed a system that combines the segmentation of a SAD module with a BERT-based model that performs Speaker Change Detection (SCD) and Speaker Role Identification (SRI) based on ASR transcripts (i.e., diarization + SRI)
The proposed model reaches up to 0.90/0.95 F1-score on ATCO/pilot for SRI on several test sets.
arXiv Detail & Related papers (2021-10-12T07:25:12Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Adversarial Reinforced Instruction Attacker for Robust Vision-Language
Navigation [145.84123197129298]
Language instruction plays an essential role in the natural language grounded navigation tasks.
We exploit to train a more robust navigator which is capable of dynamically extracting crucial factors from the long instruction.
Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target.
arXiv Detail & Related papers (2021-07-23T14:11:31Z) - Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.