Related papers: StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

URL: http://arxiv.org/abs/2107.00635v1
Date: Thu, 1 Jul 2021 17:49:31 GMT
Title: StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Authors: Hirofumi Inaguma, Tatsuya Kawahara
Abstract summary: We propose a simple alignment-free regularization method, StableEmit, to encourage MoChA to emit tokens earlier. We show that StableEmit significantly reduces the recognition errors and the emission latency simultaneously.
Score: 46.69852287267763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While attention-based encoder-decoder (AED) models have been successfully extended to the online variants for streaming automatic speech recognition (ASR), such as monotonic chunkwise attention (MoChA), the models still have a large label emission latency because of the unconstrained end-to-end training objective. Previous works tackled this problem by leveraging alignment information to control the timing to emit tokens during training. In this work, we propose a simple alignment-free regularization method, StableEmit, to encourage MoChA to emit tokens earlier. StableEmit discounts the selection probabilities in hard monotonic attention for token boundary detection by a constant factor and regularizes them to recover the total attention mass during training. As a result, the scale of the selection probabilities is increased, and the values can reach a threshold for token emission earlier, leading to a reduction of emission latency and deletion errors. Moreover, StableEmit can be combined with methods that constraint alignments to further improve the accuracy and latency. Experimental evaluations with LSTM and Conformer encoders demonstrate that StableEmit significantly reduces the recognition errors and the emission latency simultaneously. We also show that the use of alignment information is complementary in both metrics.

Related papers

Error-quantified Conformal Inference for Time Series [40.438171912992864]
Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data. We propose itError-quantified Conformal Inference (ECI) by smoothing the quantile loss function. ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines.
arXiv Detail & Related papers (2025-02-02T15:02:36Z)
Quantized and Asynchronous Federated Learning [22.40154714677385]
We develop a novel scheme, Quantized Federated AsynchronousQAL, to deal with the communication bottleneck. We prove that QAL achieves $mathtcalqr$dic convergence without requiring uniform client arrivals. We validate our theoretical findings by using standard benchmarks.
arXiv Detail & Related papers (2024-09-30T21:22:41Z)
DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays [26.032139258562708]
We propose $textbfDEER (Delay-resilient-Enhanced RL)$, a framework designed to effectively enhance the interpretability and address the random delay issues. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
arXiv Detail & Related papers (2024-06-05T09:45:26Z)
Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning [6.635084843592727]
We propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. We develop the nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries.
arXiv Detail & Related papers (2024-05-26T13:06:45Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
Measurement based estimator scheme for continuous quantum error correction [52.77024349608834]
Canonical discrete quantum error correction (DQEC) schemes use projective von Neumann measurements on stabilizers to discretize the error syndromes into a finite set. Quantum error correction (QEC) based on continuous measurement, known as continuous quantum error correction (CQEC), can be executed faster than DQEC and can also be resource efficient. We show that by constructing a measurement-based estimator (MBE) of the logical qubit to be protected, it is possible to accurately track the errors occurring on the physical qubits in real time.
arXiv Detail & Related papers (2022-03-25T09:07:18Z)
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization [72.9385528828306]
A typical transducer model decodes the output sequence conditioned on the current acoustic state. The number of blank tokens in the prediction results accounts for nearly 90% of all tokens. We propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model.
arXiv Detail & Related papers (2021-04-07T03:15:10Z)
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems. The proposed method significantly reduces recognition errors and emission latency simultaneously. The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z)
FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization [78.46088089185156]
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. Existing approaches penalize emission delay by manipulating per-token or per-frame probability prediction in sequence transducer models. We propose a sequence-level emission regularization method, named FastEmit, that applies latency regularization directly on per-sequence probability in training transducer models.
arXiv Detail & Related papers (2020-10-21T17:05:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.