StableEmit: Selection Probability Discount for Reducing Emission Latency
of Streaming Monotonic Attention ASR
- URL: http://arxiv.org/abs/2107.00635v1
- Date: Thu, 1 Jul 2021 17:49:31 GMT
- Title: StableEmit: Selection Probability Discount for Reducing Emission Latency
of Streaming Monotonic Attention ASR
- Authors: Hirofumi Inaguma, Tatsuya Kawahara
- Abstract summary: We propose a simple alignment-free regularization method, StableEmit, to encourage MoChA to emit tokens earlier.
We show that StableEmit significantly reduces the recognition errors and the emission latency simultaneously.
- Score: 46.69852287267763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While attention-based encoder-decoder (AED) models have been successfully
extended to the online variants for streaming automatic speech recognition
(ASR), such as monotonic chunkwise attention (MoChA), the models still have a
large label emission latency because of the unconstrained end-to-end training
objective. Previous works tackled this problem by leveraging alignment
information to control the timing to emit tokens during training. In this work,
we propose a simple alignment-free regularization method, StableEmit, to
encourage MoChA to emit tokens earlier. StableEmit discounts the selection
probabilities in hard monotonic attention for token boundary detection by a
constant factor and regularizes them to recover the total attention mass during
training. As a result, the scale of the selection probabilities is increased,
and the values can reach a threshold for token emission earlier, leading to a
reduction of emission latency and deletion errors. Moreover, StableEmit can be
combined with methods that constraint alignments to further improve the
accuracy and latency. Experimental evaluations with LSTM and Conformer encoders
demonstrate that StableEmit significantly reduces the recognition errors and
the emission latency simultaneously. We also show that the use of alignment
information is complementary in both metrics.
Related papers
- Breaking the Pre-Sampling Barrier: Activation-Informed Difficulty-Aware Self-Consistency [10.079669716138763]
Self-Consistency (SC) is an effective decoding strategy that improves the reasoning performance of Large Language Models (LLMs)<n>It suffers from substantial inference costs because it requires a large number of samples.<n>We propose Activation-Informed Difficulty-Aware Self-Consistency (ACTSC) to address these limitations.
arXiv Detail & Related papers (2026-02-10T06:05:11Z) - Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance [0.0]
We define explanation drift as the epoch-to-epoch change in normalized token attributions on a fixed probe set.<n>RSP is computed from within-run drift dynamics and requires no tuning on out-of-distribution data.
arXiv Detail & Related papers (2026-01-12T17:48:05Z) - Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models [0.0]
Stratified Hazard Sampling Sampling (SHS) models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC)<n>SHS models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC) and places events by stratifying this cumulative quantity.<n>We also introduce a phase-allocation variant for blacklist-style lexical constraints that prioritizes early edits at high-risk positions to mitigate late-masking artifacts.
arXiv Detail & Related papers (2026-01-06T08:19:02Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction [12.740812798007573]
finite memory induces forgetfulness that harms retrieval-intensive tasks.<n>We explore a series of hybrid models that restore direct access to past tokens.<n>We propose a novel learnable token eviction approach.
arXiv Detail & Related papers (2025-10-23T17:53:03Z) - Intra-request branch orchestration for efficient LLM reasoning [52.68946975865865]
Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms to improve accuracy on complex tasks.<n>Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors.<n>We present DUCHESS, an LLM serving system that reduces cost and latency without sacrificing accuracy through intra-request branch orchestration guided by predictions.
arXiv Detail & Related papers (2025-09-29T15:52:08Z) - TransDF: Time-Series Forecasting Needs Transformed Label Alignment [53.33409515800757]
We propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance.<n>Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount.
arXiv Detail & Related papers (2025-05-23T13:00:35Z) - Error-quantified Conformal Inference for Time Series [40.438171912992864]
Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data.
We propose itError-quantified Conformal Inference (ECI) by smoothing the quantile loss function.
ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines.
arXiv Detail & Related papers (2025-02-02T15:02:36Z) - Quantized and Asynchronous Federated Learning [22.40154714677385]
We develop a novel scheme, Quantized Federated AsynchronousQAL, to deal with the communication bottleneck.
We prove that QAL achieves $mathtcalqr$dic convergence without requiring uniform client arrivals.
We validate our theoretical findings by using standard benchmarks.
arXiv Detail & Related papers (2024-09-30T21:22:41Z) - DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays [26.032139258562708]
We propose $textbfDEER (Delay-resilient-Enhanced RL)$, a framework designed to effectively enhance the interpretability and address the random delay issues.
In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications.
The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
arXiv Detail & Related papers (2024-06-05T09:45:26Z) - Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning [6.635084843592727]
We propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token.
SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism.
We develop the nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries.
arXiv Detail & Related papers (2024-05-26T13:06:45Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Measurement based estimator scheme for continuous quantum error
correction [52.77024349608834]
Canonical discrete quantum error correction (DQEC) schemes use projective von Neumann measurements on stabilizers to discretize the error syndromes into a finite set.
Quantum error correction (QEC) based on continuous measurement, known as continuous quantum error correction (CQEC), can be executed faster than DQEC and can also be resource efficient.
We show that by constructing a measurement-based estimator (MBE) of the logical qubit to be protected, it is possible to accurately track the errors occurring on the physical qubits in real time.
arXiv Detail & Related papers (2022-03-25T09:07:18Z) - FSR: Accelerating the Inference Process of Transducer-Based Models by
Applying Fast-Skip Regularization [72.9385528828306]
A typical transducer model decodes the output sequence conditioned on the current acoustic state.
The number of blank tokens in the prediction results accounts for nearly 90% of all tokens.
We propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model.
arXiv Detail & Related papers (2021-04-07T03:15:10Z) - Alignment Knowledge Distillation for Online Streaming Attention-based
Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems.
The proposed method significantly reduces recognition errors and emission latency simultaneously.
The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z) - FastEmit: Low-latency Streaming ASR with Sequence-level Emission
Regularization [78.46088089185156]
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible.
Existing approaches penalize emission delay by manipulating per-token or per-frame probability prediction in sequence transducer models.
We propose a sequence-level emission regularization method, named FastEmit, that applies latency regularization directly on per-sequence probability in training transducer models.
arXiv Detail & Related papers (2020-10-21T17:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.