ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech
Recognition and Natural Language Understanding of Air Traffic Control
Communications
- URL: http://arxiv.org/abs/2211.04054v2
- Date: Thu, 15 Jun 2023 13:53:05 GMT
- Title: ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech
Recognition and Natural Language Understanding of Air Traffic Control
Communications
- Authors: Juan Zuluaga-Gomez and Karel Vesel\'y and Igor Sz\"oke and Alexander
Blatt and Petr Motlicek and Martin Kocour and Mickael Rigault and Khalid
Choukri and Amrutha Prasad and Seyyed Saeed Sarfjoo and Iuliia Nigmatulina
and Claudia Cevenini and Pavel Kol\v{c}\'arek and Allan Tart and Jan
\v{C}ernock\'y and Dietrich Klakow
- Abstract summary: We introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging air traffic control (ATC) field.
The ATCO2 corpus is split into three subsets.
We expect the ATCO2 corpus will foster research on robust ASR and NLU.
- Score: 51.24043482906732
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Personal assistants, automatic speech recognizers and dialogue understanding
systems are becoming more critical in our interconnected digital world. A clear
example is air traffic control (ATC) communications. ATC aims at guiding
aircraft and controlling the airspace in a safe and optimal manner. These
voice-based dialogues are carried between an air traffic controller (ATCO) and
pilots via very-high frequency radio channels. In order to incorporate these
novel technologies into ATC (low-resource domain), large-scale annotated
datasets are required to develop the data-driven AI systems. Two examples are
automatic speech recognition (ASR) and natural language understanding (NLU). In
this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering
research on the challenging ATC field, which has lagged behind due to lack of
annotated data. The ATCO2 corpus covers 1) data collection and pre-processing,
2) pseudo-annotations of speech data, and 3) extraction of ATC-related named
entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set
corpus contains 4 hours of ATC speech with manual transcripts and a subset with
gold annotations for named-entity recognition (callsign, command, value). 2)
The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched
with automatic transcripts from an in-domain speech recognizer, contextual
information, speaker turn information, signal-to-noise ratio estimate and
English language detection score per sample. Both available for purchase
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3)
The ATCO2-test-set-1h corpus is a one-hour subset from the original test set
corpus, that we are offering for free at https://www.atco2.org/data. We expect
the ATCO2 corpus will foster research on robust ASR and NLU not only in the
field of ATC communications but also in the general research community.
Related papers
- Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control [60.35553925189286]
We propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture.
We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets.
arXiv Detail & Related papers (2024-06-19T21:11:01Z) - Lessons Learned in ATCO2: 5000 hours of Air Traffic Control
Communications for Robust Automatic Speech Recognition and Understanding [3.4713477325880464]
ATCO2 project aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time.
This paper reviews previous work from ATCO2 partners, including robust automatic speech recognition.
We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field.
arXiv Detail & Related papers (2023-05-02T02:04:33Z) - A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers [0.797970449705065]
We propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training.
The engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding.
To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools.
arXiv Detail & Related papers (2023-04-16T17:45:21Z) - Speech and Natural Language Processing Technologies for Pseudo-Pilot
Simulator [0.5480546613836199]
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training.
E.g., a human pilot is still required in EURO's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training.
This need can be substituted by an automatic system that could act as a pilot.
arXiv Detail & Related papers (2022-12-14T11:34:59Z) - Call-sign recognition and understanding for noisy air-traffic
transcripts using surveillance information [72.20674534231314]
Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO)
The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO.
We propose a new call-sign recognition and understanding (CRU) system that addresses this issue.
The recognizer is trained to identify call-signs in noisy ATC transcripts and convert them into the standard International Civil Aviation Organization (ICAO) format.
arXiv Detail & Related papers (2022-04-13T11:30:42Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
Recognition [91.33781557979819]
We introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers.
We provide detailed statistics of both the clean and the augmented versions of our dataset.
arXiv Detail & Related papers (2022-01-11T06:32:12Z) - Grammar Based Identification Of Speaker Role For Improving ATCO And
Pilot ASR [1.1391158217994781]
Assistant Based Speech Recognition (ABSR) for air traffic control is generally trained by pooling both Air Traffic Controller (ATCO) and pilot data.
Due to data imbalance of ATCO and pilot and varying acoustic conditions, the ASR performance is usually significantly better for ATCOs than pilots.
arXiv Detail & Related papers (2021-08-27T08:40:08Z) - Contextual Semi-Supervised Learning: An Approach To Leverage
Air-Surveillance and Untranscribed ATC Data in ASR Systems [0.6465251961564605]
The callsign used to address an airplane is an essential part of all ATCo-pilot communications.
We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates.
arXiv Detail & Related papers (2021-04-08T09:53:54Z) - Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
and Language Model Pretraining [64.35907499990455]
We propose a framework to learn semantics directly from speech with semi-supervision from transcribed or untranscribed speech.
Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT.
In parallel, we identify two essential criteria for evaluating SLU models: environmental noise-robustness and E2E semantics evaluation.
arXiv Detail & Related papers (2020-10-26T18:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.