Related papers: In-domain SSL pre-training and streaming ASR

In-domain SSL pre-training and streaming ASR

URL: http://arxiv.org/abs/2509.12101v1
Date: Mon, 15 Sep 2025 16:25:43 GMT
Title: In-domain SSL pre-training and streaming ASR
Authors: Jarod Duret, Salima Mdhaffar, Gaëlle Laperrière, Ryan Whetten, Audrey Galametz, Catherine Kobus, Marion-Cécile Martin, Jo Oleiwan, Yannick Estève,
Abstract summary: We train BEST-RQ models on 4.5k hours of unlabeled ATC data, then fine-tune on a smaller supervised ATC set.<n>We compare these in-domain SSL models against state-of-the-art, general-purpose speech encoders such as w2v-BERT 2.0 and HuBERT.<n>Results show that domain-adapted pre-training substantially improves performance on standard ATC benchmarks.
Score: 11.496573723272796
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we investigate the benefits of domain-specific self-supervised pre-training for both offline and streaming ASR in Air Traffic Control (ATC) environments. We train BEST-RQ models on 4.5k hours of unlabeled ATC data, then fine-tune on a smaller supervised ATC set. To enable real-time processing, we propose using chunked attention and dynamic convolutions, ensuring low-latency inference. We compare these in-domain SSL models against state-of-the-art, general-purpose speech encoders such as w2v-BERT 2.0 and HuBERT. Results show that domain-adapted pre-training substantially improves performance on standard ATC benchmarks, significantly reducing word error rates when compared to models trained on broad speech corpora. Furthermore, the proposed streaming approach further improves word error rate under tighter latency constraints, making it particularly suitable for safety-critical aviation applications. These findings highlight that specializing SSL representations for ATC data is a practical path toward more accurate and efficient ASR systems in real-world operational settings.

Related papers

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z)
Trajectory Design for UAV-Based Low-Altitude Wireless Networks in Unknown Environments: A Digital Twin-Assisted TD3 Approach [62.11847362756054]
Unmanned aerial vehicles (UAVs) are emerging as key enablers for low-altitude wireless network (LAWN)<n>We propose a digital twin (DT)-assisted training and deployment framework.<n>In this framework, the UAV transmits integrated sensing and communication signals to provide communication services to ground users, while simultaneously collecting echoes that are uploaded to the DT server to progressively construct virtual environments (VEs)<n>These VEs accelerate model training and are continuously updated with real-time UAV sensing data during deployment, supporting decision-making and enhancing flight safety.
arXiv Detail & Related papers (2025-10-28T10:05:53Z)
Joint Communication Scheduling and Velocity Control for Multi-UAV-Assisted Post-Disaster Monitoring: An Attention-Based In-Context Learning Approach [13.429550708956294]
Unmanned Aerial Vehicles (UAVs) are increasingly being investigated to collect sensory data in post-disaster monitoring scenarios, such as tsunamis.<n>A major challenge is to design the data collection schedules and flight velocities, as unfavorable schedules and velocities can lead to transmission errors and buffer overflows of the ground sensors.<n>This paper proposes a joint optimization of data collection schedules and velocities control for multiple UAVs to minimize data loss.
arXiv Detail & Related papers (2025-10-07T09:04:56Z)
Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection [71.84636717632206]
Unmanned aerial vehicles (UAVs) for reliable and energy-efficient data collection from spatially distributed devices holds great promise in supporting Internet of Things (IoT) applications.<n>We propose a joint language model (LLM) to learn effective UAV control policies.<n>LLM-CRDT outperforms benchmark online and offline methods, achieving up to 36.7% higher energy efficiency than current state-of-the-art DT approaches.
arXiv Detail & Related papers (2025-09-17T13:05:08Z)
Age of Information Minimization in UAV-Enabled Integrated Sensing and Communication Systems [34.92822911897626]
Unmanned aerial vehicles (UAVs) equipped with integrated sensing and communication (ISAC) capabilities are envisioned to play a pivotal role in future wireless networks.<n>We propose Age Information (AoI) system that simultaneously performs target sensing and multi-user communication.
arXiv Detail & Related papers (2025-07-18T18:17:09Z)
FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring [14.068881151569435]
Unmanned Aerial Vehicles (UAVs) are vital for public safety, particularly in wildfire monitoring, where early detection minimizes environmental impact.<n>In UAV-Assisted Wildfire Monitoring (UAWM) systems, joint optimization of sensor transmission scheduling and velocity is critical for minimizing Age of Information (AoI) from stale sensor data.<n>Deep Reinforcement Learning (DRL) has been used for such optimization; however, its limitations such as low sampling efficiency, simulation-to-reality gaps, and complex training render it unsuitable for time-critical applications like wildfire monitoring.<n>This paper introduces a new online Flight Resource Allocation scheme based on
arXiv Detail & Related papers (2025-07-14T10:24:43Z)
Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation [48.265859815346985]
Radar signal recognition plays a pivotal role in electronic warfare (EW)<n>Recent advances in deep learning have shown significant potential in improving radar signal recognition.<n>These methods fall short in EW scenarios where annotated radio frequency (RF) data are scarce or impractical to obtain.
arXiv Detail & Related papers (2025-01-07T01:35:56Z)
Automatic UAV-based Airport Pavement Inspection Using Mixed Real and Virtual Scenarios [3.0874677990361246]
We propose a vision-based approach to automatically identify pavement distress using images captured by UAVs. The proposed method is based on Deep Learning (DL) to segment defects in the image. We demonstrate that the use of a mixed dataset composed of synthetic and real training images yields better results when testing the training models in real application scenarios.
arXiv Detail & Related papers (2024-01-11T16:30:07Z)
A LSTM and Cost-Sensitive Learning-Based Real-Time Warning for Civil Aviation Over-limit [0.0]
A real-time warning model for civil aviation over-limit is proposed based on QAR data monitoring. The proposed model achieves an F1 score of 0.991 and an accuracy of 0.978, indicating its effectiveness in real-time warning of civil aviation over-limit.
arXiv Detail & Related papers (2023-05-08T10:56:06Z)
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)
Model-based Deep Learning Receiver Design for Rate-Splitting Multiple Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods. The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead. Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z)
How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications [1.3800173438685746]
We study the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases. We benchmark the proposed models on four challenging ATC test sets. We also study the impact of fine-tuning data size on WERs, going from 5 minutes (few-shot) to 15 hours.
arXiv Detail & Related papers (2022-03-31T06:10:42Z)
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
Prediction of Traffic Flow via Connected Vehicles [77.11902188162458]
We propose a Short-term Traffic flow Prediction framework so that transportation authorities take early actions to control flow and prevent congestion. We anticipate flow at future time frames on a target road segment based on historical flow data and innovative features such as real time feeds and trajectory data provided by Connected Vehicles (CV) technology. We show how this novel approach allows advanced modelling by integrating into the forecasting of flow, the impact of various events that CV realistically encountered on segments along their trajectory.
arXiv Detail & Related papers (2020-07-10T16:00:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.